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To the Student 


Linear algebra comprises a variety of topics and viewpoints, including 
computational machinery (matrices), abstract objects (vector spaces), 
mappings (linear transformations), and the “fine structure” of linear 
transformations (diagonalizability). 

Like all mathematical subjects at an introductory level, linear al- 
gebra is driven by examples and comprehended by unfamiliar theory. 
Each of the preceding topics can be difficult to assimilate until the 
others are understood. You, the reader, consequently face a chicken- 
and-egg problem: Examples appear unconnected without a theoretical 
framework, but theory without examples tends to be dry and unmoti- 
vated. 

This preface sketches an overview of the entire book by looking at 
a family of representative examples. The “universe” is the Cartesian 
plane R?, whose coordinates we denote (x!,a?). (The use of indices 
instead of the more familiar (x,y) will economize our use of letters, 
particularly when we begin to study functions of arbitrarily many vari- 
ables. The use of superscripts as indices rather than as exponents 
highlights important, subtle structure in formulas.) We view ordered 
pairs as individual entities, and write « = (x!, x). 


Vectors and Vector Spaces 


We view an ordered pair x as a vector, a type of object that can be 
added to another vector, or multiplied by a real constant (called a 
scalar) to obtain another vector. If 2 = (x!,x?) and y = (y!,y?), and 
if c is a scalar, we define 

e+y=(s't+y',27+y"), ce = (cx',cr’). 


The set R? equipped with these operations is said to be a vector space. 


The vector « = (x!,x?) in the plane may be viewed geometrically 


as the arrow with its tail at the origin 0 = (0,0) and its tip at the 


ill 
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point z. Vector addition corresponds to forming the parallelogram 
with sides x and y, and taking x + y to be the far corner. Scalar 
multiplication cx corresponds to “stretching” a by a factor of cif c > 0, 
or to stretching by a factor of |c] and reversing the direction if c < 0. 


y=(y',y") 


Linear Transformations 


In linear algebra, most mappings send vector spaces to vector spaces. 
The special properties dictated by linear algebra may be written 


S(a@+y) = S(#) + S(y), 
for all vectors x, y, all scalars c. 
Sea) := cS (a) 
For technical convenience, these conditions are often expressed as a 
single condition 


S(cx# + y) =cS(x)+ S(y) for all vectors a, y, all scalars c. 


A mapping S satisfying this conditions is called a linear transformation. 

Geometrically, if a and y are arbitrary vectors and c is a scalar, so 
that ca + y is the far corner of the parallelogram with sides ca and y, 
then S(ca+y) = cS(a#)+S(y) is the far corner of the parallelogram with 
sides S(ca) = cS(a) and S(y). A linear transformation S therefore 
maps an arbitrary parallelogram to a parallelogram in an obvious (and 
restrictive) sense. 
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The special vectors ey = (1,0) and eg = (0,1) are the standard 
basis of R?. Every vector in R? can be expressed uniquely as a linear 
combination: 


(x', 0) + (0, x”) 
a'(1,0) + x(0, 1) 


= rey + re. 


Formally, a linear transformation “distributes” over an arbitrary linear 
combination. In detail, if S : R? > R? is a linear transformation, 
repeated application of the defining properties gives 


S(x) = S(a'e; + xe) 
= S(x'e1) + S(a7e2) 
= x'S(e;) + x7S(e2). 


This innocuous equation expresses a remarkable conclusion: A linear 
transformation S : R? > R? is completely determined by its values 
S(e1), S(e2) at two vectors. 


Matrix Representation 


To study vector spaces and linear transformations in greater detail, we 
will represent vectors and linear transformations as rectangular arrays 
of numbers, called matrices. The first chapter of the book introduces 
matrix notation, a central piece of computational machinery in linear 
algebra. Here we focus on motivation and geometric intuition, using 
the special case of linear transformations from the plane to the plane. 

We use the notational convention in which vectors are written as 


columns: 
1 2 or 
(x ov ) = a2 : 


With this notation, a linear transformation S : R? > R? is completely 
and uniquely specified by scalars A} satisfying 


Al AS 
S(e1) = Ajei + Azes _ 4 ; S(e2) = Ase, + Abe» = 3] : 
1 2 
The (standard) matrix of S assembles these into a rectangular array A: 


a 1 
A= [s(ex): -S(e] = 4 F| 
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The real number A’ in the ith row and jth column of A is called the 
(i, 7)-entry, and encodes the dependence of the ith output variable on 
the jth input variable. 


Matrix Multiplication 


For all x, we have 


The expression on the right may be interpreted as a type of “product” 
of the matrix of S and the column vector of x: 


y'| _ [Aja'+ Aja?) — [Ay AQ] [e" 
)-EEAE]-2 
or simply y = Aw. The second equality defines the product of the 
“2 x 2 square matrix A” and the “2 x 1 column matrix a”. 
Particularly when the number of variables is large, sigma (summa- 
tion) notation comes into its own, both condensing common expres- 


sions and highlighting their structure. The relationship between the 
inputs x/ and the outputs y’ of a linear transformation may be written 


y= Dy Ail 


Composition of Linear Transformations 
If T : R? > R? is a linear transformation with matrix 


Bi | 
B= al 2 ; 
at a 


the composition T 0S : R? —> R?, defined by (T’o S)(a) = T(S(a)), is 
easily shown to be a linear transformation (check this). The associated 


*Physicists sometimes go further, omitting the summation sign and implicitly 
summing over every index that appears both as a subscript and as a superscript: 
y= Aix), This book does not use this “Einstein summation convention”, but 
the possibility of doing so explains our use of superscripts as indices, despite the 
potential risk of reading superscripts as exponents. Exponents appear so rarely in 
linear algebra that mentioning them at each occurrence is feasible. 
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matrix is a “product” of the matrices B and A of the transformations T 
and S. To determine the entries of this product, note that by definition, 


T(e1) = Bie1 + Bjex, S(e1) = Ajei + Aveo, 
T(eg) = Bye, a B5eo, S(e2) = Ase, + Abeo. 
Consequently, 


TS(e1) = T(Aje, + Aves) 
— AiT(e1) + A?T(e2) 
= Aj(Brei + Bfe2) + Aj(Bge1 + Bye2) 
= (BLA} + By Aj)e; + (BY A] + Bp Af)eo: 


similarly (check this), 
TS(e2) = (BL Ag + By A5)e1 + (BY A} + B3A5)ex. 


Since the coefficients of T(e1) give the first column of the matrix of T'S 
and the coefficients of T'(e2) give the second column of the matrix, we 
are led to define the matrix product by 

Bt Bz) [Ay Ag) _ [Bp A, + B2At By Ad + By As 

B? Be| |At AZ|” | B2Al+ BZA? B?AS + BAZ] ° 


BA= 
This forbidding collection of formulas is clarified by summation nota- 


tion: 
i ak 
(BA); =) BA}. 


The preceding equation has precisely the same form for matrices of arbi- 
trary size, and furnishes our general definition of matrix multiplication. 
(Convince yourself that the entries of BA are given by the preceding 
summation formula.) 

When working computationally with specific matrices, the formula 
is generally less important than the procedure encoded by the formula. 
First, define the “product” of a “row” and a “column” by 


6 * = [ba + Ba!) 


Now, to find the (i, 7)-entry of the product BA, multiply the ith row 
of B by the jth column of A. For example, to find the entry in the 
first row and second column of BA, multiply the first row of B by the 
second column of A. 
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Geometry of Linear Transformations 


Consider the linear transformation S that rotates R? about the origin 
by , and T’ that shears the plane horizontally by one unit: 


S(e€) T(e2) 


eak S(e1) 
a AFH 


The matrix of each may be read off the images of the standard basis 
vectors. Thus, 


bee cos ] = a A 
. — 2 9 


The composite transformations T'S (rotate, then shear) and ST 
(shear, then rotate) are linear, and their matrices may be found by 
matrix multiplication: 


pana lt a] aB=al vagal: 


Note carefully that T'S 4 ST. Composition of linear transformations, 
and consequently multiplication of square matrices, is generally a non- 
commutative operation. 
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Diagonalization 


The so-called identity matrix 


b 


corresponds to the identity mapping J(a) = a for all # in R?. Gener- 
ally, if A! and \? are real numbers, the diagonal matrix 


M0 
0 
corresponds to axial scaling, (2!, x?) + (Atx!, \2x?). Diagonal matri- 


ces are among the simplest matrices. In particular, if n is a positive 
integer, the nth power of a diagonal matrix is trivially calculated: 


oO) ORE 20 
r | 7 0 o2y| 
The solution to a variety of mathematical problems rests on our 
ability to compute arbitrary powers of a matrix. We are naturally led 
to ask: If S : R? > R? is a linear transformation, does there exist 
a coordinate system in which S acts by axial scaling? This question 


turns out to reduce to existence of scalars \! and \?, and of non-zero 
vectors vj and v2, such that 


S(v1) = Mv, S'(v2) = Mv». 


Each X° is an eigenvalue of S; each v; is an eigenvector of S. A pair of 
non-proportional eigenvectors in the plane is an ezgenbasis for S. 

A linear transformation may or may not admit an eigenbasis. The 
rotation S of the preceding example has no real eigenvalues at all. The 
shear 7’ has one real eigenvalue, and admits an eigenvector, but has 
no eigenbasis. The compositions T'S and ST’ both turn out to admit 
eigenbases. 


Structural Summary 


Basic linear algebra has three parallel “levels”. In increasing order of 
abstraction, they are: 


(i) “The level of entries”: Column vectors and matrices written out 
as arrays of numbers. (y’ = )/; Aj2/.) 
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(ii) “The level of matrices”: Column vectors and matrices written as 
single entities. (y = Az.) 


(iii) “The abstract level”: Vectors (defined axiomatically) and linear 
transformations (mappings that distribute over linear combinations). 


(y = S(a).) 


Linear algebra is, at heart, the study of linear combinations and 
mappings that “respect” them. Along your journey through the ma- 
terial, strive to detect the levels’ respective viewpoints and idioms. 
Among the most universal idioms is this: A linear combination of linear 
combinations is a linear combination. 

Matrices are designed expressly to handle the bookkeeping details. 
In summation notation at level (i), if 


then substitution of the second into the first gives 


n nm n 


m 
A= SBS Abel =O (>: vias) of = (BAYia 
k=1 


k=1 j=l j=l j=l 


Once we establish properties of matrix operations, the preceding can 
be distilled down to an extremely simple computation at level (ii): If 
z= By and y = Ag, then z = B(Ax) = (BA)a. 


Organization of the Book 


The first chapter introduces real matrices as formally and quickly as 
feasible. The goal is to construct machinery for flexible computation. 
The book proceeds to introduce vector spaces and their properties, 
two auxiliary pieces of algebraic machinery (the dot product, and the 
determinant function on square matrices, each of which has a useful 
geometric interpretation), linear transformations and their properties, 
and diagonalization. 

Of necessity, the motivation for a particular definition may not be 
immediately apparent. At each stage, we are merely generalizing and 
systematizing the preceding summary. It may help to review this pref- 
ace periodically as you proceed through the material. 
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Chapter 1 


Matrix Algebra 


Definition 1.1. Let m and n be positive integers. An m x n matrix 
is a rectangular array having m rows and n columns: 


Al AL... AL 
AP Az... AB 
Am Am |. am 


or simply [A‘] for brevity if the size is implicit (or unimportant). The 
(i,j) entry A’ is written in the ith row and jth column. 

If A = [A‘] and B = [Bj] are m x n real matrices, their sum is the 
m x n matrix defined by A+ B = [Aj + Bi]. 

If c is real, we define scalar multiplication by cA = c{Aj] = [cA']. 


Remark 1.2. The superscripts are row indices, not exponents. Explicit 
mention is made in this book when superscripts do stand for exponents. 
The idiomatic value of this notational convention will gradually become 
apparent. 

When we write [Aj], the indices 7 and j are dummies, having no 
meaning outside the brackets. We may use any convenient letters with- 
out changing the meaning, e.g., [A‘] = [Af], so long as the row index 
runs from 1 to m and the column index runs from 1 to n. What mat- 
ters is that an unambiguous matrix entry is associated to each pair of 
indices. 


Remark 1.3. In this book, the entries of a matrix are usually real num- 
bers. We may speak of real matrices to emphasize this fact. The set of 
all m x n real matrices is denoted R’”*”. 
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However, the entries of a matrix might be other kinds of numbers 
(leading us to speak, for example, of integer matrices or complex matri- 
ces), or even other matrices (in which case we speak of block matrices). 
The space of all complex m x n matrices is denoted C™*”. 


Definition 1.4. A matrix of size n x 1 is called a column. We denote 
the set of columns by R” (read “RN”) rather than by R’™!. 

A matrix of size 1 x n is called a row. We denote the set of rows 
by (R”)* (read “RN star”) rather than by R!*”. 

tA= [A‘] is an m X n matrix, A’ denotes the ith row, an element 
of (R")*, and A; denotes the jth column, in R”. 


Definition 1.5. If 72 and 7 are integers, the expression 


Oj = Brae : 
OEE, 
is called the Kronecker delta-symbol. 


Example 1.6. The following matrices are all defined by Ai = bi: 


1 0 0 1 0 1 
0 1 0], 0 1], E : i , 0 
00 1 0 0 0 
Definition 1.7. Let n be a positive integer. For each 7 = 1,...,n, we 


define e; to be the element of R” whose ith row is bi. That is, the 
jth row is 1 and all other entries are 0: 


1 0 0 

0 1 0 
e, = ’ eg=].]; » En = 

0 0 1 


The ordered set (e;)'/_; is called the standard basis of R”. 


Definition 1.8. Let n be a positive integer. For each 7 = 1,...,n, we 
define e' to be the element of (R”)* whose jth column is 6;. That is, 
the ith column is 1 and all other entries are 0: 


ere (0 eae (ON era Ot SoU soe CPOs Oot ble 


The ordered set (e’)"_, is called the standard dual basis of (R")*. 
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Definition 1.9. Let n be a positive integer. A matrix of size n x n is 
called a square matrix. 

The main diagonal of a square matrix is the set of entries (Aj yp 
running from the upper left to the lower right. A square matrix is dzag- 
onal if every non-diagonal entry is 0. The diagonal matrix J, in R”*” 
defined by Al = 5} is called the (n x n) identity matriz. 


Example 1.10. The identity matrices are , = [1], h= i if 


0 1 
100 0 10... 0 
I 0 0 Ok O30 O01. 0 
oa 000 1 OP O" « 1 


Remark 1.11. The jth column of J, is the standard basis vector e;. In 
other words, J, may be written as a (block) row matrix whose jth col- 
umn is ej, namely, I, = [e1 €2 ... en]. 

Similarly, the ith row of I, is e’, so I, may be written as a (block) 
column matrix whose ith row is e’. 


Definition 1.12. If A = [A‘] € R™*", the transpose of A is the 
matrix AT in R’*™ whose (i,j) entry is At. 


a ae ca? J a’ b) 
i 2 22 ) 

Example 1.13. iF 2 | E Ae and vice versa. 
Remark 1.14. The transpose of a row matrix is a column, and vice versa. 
Particularly, the transpose defines a map from R” (the set of columns) 
to (R”)* (the set of rows). For example, e] = e’ and (e’)' = e;. 

Generally, the ith row of A! is the transpose of the ith column of A, 
and the jth column of A! is the transpose of the jth row of A. The 
transpose of A! is A itself: (A')' = A. 
Remark 1.15. If A = [A‘] and B= [Bi] are in R™*", and if c is real, 
then (cA + iB) _ cAl + B". This is immediate by comparing entries: 
(cA + B)i = cAt + Bi. 


1.1 Matrix Multiplication 


An m X n real matrix is a “data structure” containing mn real num- 
bers. Matrix addition and scalar multiplication merely “parallelize” 
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real addition and real multiplication. A third operation, matrix multi- 
plication, “mixes” entries. The definition is introduced in two stages. 


Definition 1.16. Let m, p, and n be positive integers. Assume 
2=(&)=(& fo ... &|E(R?)*, v=lv*]= € R?. 


We define the product or dual pairing ( | )pp : (R?)* x R? > R by 


2 P 
(£| wpe = [A fo... Gy) | . | = faut + fav? +--+ Bu? = See" 
- k=1 
Generally, if A = [Ai] € R™*? and B = [BF € R?*”, their (ma- 
trix) product is the m x n matrix C = [C’] defined by 


Ci = (A' |B; 


Pp 
\pp = ALB; + ARBF +--+ ADB? = SO ALBE. 
k=1 


That is, Ci is the dual pairing of the ith row of A, an element of (R?)*, 
and the jth column of B, an element of R?. 


Example 1.17. Let n be a positive integer, (e;)!"_, the standard basis 
of R”, and (e’)?_; the standard dual basis of (R")*. For 1 <i,j <n, 
we have e/e; = 67, a 1 x 1 matrix. (In this “inner” product, p = n.) 
Example 1.18. Let m and n be positive integers, (e;)”", the standard 
basis of R™, and (e’)?_, the standard dual basis of (R")*. For each 
7 =1,...,m and j = 1,...,n, the “outer” product e} = e;e! (for 
which p = 1) is the m x n matrix having a 1 in the (7,7) entry and 
0’s everywhere else: 


0 O ay ars 0 
ese’ = |1] [0 De Oh 0 1 0} =e! 
0 Osos 0 0 
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Matrix multiplication is associative, has an identity element, com- 
mutes with scalar multiplication, and distributes over addition. These 
properties allow us to work algebraically with matrices as individual 
entities, often avoiding the messiness of working with large collections 
of entries. To state these properties precisely requires only that we 
exercise care regarding sizes of operands. 


Theorem 1.19. Let m, p, q, n be positive integers, c a real number, 
A=[Ai]eR”™”?, B= [BE], B! = [BY] e RX, C= [Chl eR”. 


(i 
(ii) ImA =A and AI, = A; 


(AB)C = A(BC); 


) 

) 
(iii) A(cB) = c(AB); 
(iv) 


Sketch of proof. In each part, it suffices to show that two matrices have 
the same (i, 7) entries for all indices i and j. 


(i). It suffices to prove }),(AB)) Ch = >, Al i.(BC)F. But 


qd Pp p q D q 
(>: 408) ch=S"->- AlBict =~ At (>: ates) 


f=1 \k=1 k=1 =1 k=1 l=1 


A(B +B’) = AB + AB’ and (B+ B')C = BC+BC. 


(ii). For 1 <i <mand1<j <p, we have 


= Aish = 4} 


Pp 
k=1 


because of the factor oF, only the term with k = 7 is non-zero. The 
other identity is similar 


Pp Pp 
(iii). S° AL(cBF) = S_ (cA, B 
k=1 k=1 
k £ ; i k 
(iv) ) Poa BY + BY) = 5 —(A,.BF + A,B’). 
k=1 


The se asians law is similar. 
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Remark 1.20. In the proof of (ii), multiplying by a Kronecker delta and 
summing selects one term. This computational idiom is important. 

Despite the “mixing” of entries in the definition of matrix multi- 
plication, each identity comes down to a corresponding property for 
ordinary real number arithmetic. 


Multiplication by a standard row or column “selects” part of A, 
yielding useful criteria for two matrices to be equal. 


Corollary 1.21. Let A be anmxn matrix, and lett andj be arbitrary 
indices with 1 <i<mand1<j <n. Viewing e’ as a row matrix 
in (R”)* and e; as a column matrix in R™: 


(i) The product e'A is A’, the ith row of A; 

(ii) The product Ae; is Aj, the jth column of A; 

(iii) The product e' Ae; is Ai, the (i, 7) entry of A. 
Particularly, if Ae; = 0" for 9 =1, «i,m then A=0T™™. 


Proof. Since e’ is the ith row of J, and e; is the jth column of Im, 
(i), (ii), and (iii) follow immediately from Theorem 1.19 (ii). Finally, if 
Ae; is the zero vector for every j, then every entry of A is zero. 


Remark 1.22. To summarize, if Ax = 0” for all x in R” (or even if 
this equation holds merely for the standard basis vectors), then A is 
the zero matrix. (The converse is obvious.) 

Analogously, if y'A = (0”)' for all y in R” (or even if e’A = (0")' 
fort. = 1 asin), then A= 00", 


Corollary 1.23. If A and A’ are m x n matrices, and if Ae; = A’e; 
forj=1,...,n, hen A=A’. 


Proof. Apply Corollary 1.21 to the difference A — A’, and use the dis- 
tributive law to note that (A — A’)e; = Ae; — A’e;. 


Non-Commutativity of Matrix Multiplication 


Matrix multiplication is not commutative. Even if A and B are both 
n Xn, so that both products AB and BA are defined and have the 
same size, it need not be the case that 


n n 
(AB)i= 5° A,BF and (BA) = S— BLAM 
k=1 k=1 
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are equal for all i and 7. 


Example 1.24. The matrices 


0 1 O20 . 1 0 0 0 
Ae), | and B= [1 Hl satisfy AB = [1 ale BA=|) Ap 


Example 1.25. If Ais ann xn matrix, then A commutes with matrices 
including J,, A, A?, and in general every “polynomial” in A. 


Theorem 1.26. [f A = [Aj] © R™*? and B = [Bi] € RP*”, then 
(AB)' = B'A! as elements of R"*™. 


Proof. Since (AB)! = A’ B®, the (i, 7) entry of (AB)' is 
j kt k 9 


P 


(ABy = SAL BE =O Bk al = STBTLANE, 


the (i,j) entry of B'A. 


Remark 1.27. A “visual” proof of Theorem 1.26 can be given by con- 
sidering the following diagrammatic depiction of matrix multiplication: 


B; 
B} 
Bi 
a |B 
Ai | Ai At sa Sf (AB); 
A AB 


Reflecting the paper across the diagonal line has the effect of trans- 
posing and reversing the order of the factors, and of transposing the 
product. 
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Invertibility of Square Matrices 


Definition 1.28. Let n be a positive integer. An n x n matrix A is 
invertible if there exists an n X n matrix B such that AB = I, and 
BA=T,. Such a matrix B is called an inverse of A. 


Remark 1.29. Part (i) of the next result shows an invertible matrix 
has a unique inverse. We may therefore speak of the inverse of A, and 
denote it A~!. Note that A7! is invertible, and (A~!)~! = A. 


Theorem 1.30. Let A, B, and B’ ben x n matrices. 
(i) If B and B’ are inverses of A, then B = B’. 


(ii) Jf AB = In, and if either A or B is invertible, then BA = In, so 
B=A, 


(iii) If A and B are invertible, then their product AB is invertible, and 
(AB) 2B A, 


(iv) If A is invertible, then A' is invertible, and (A')~! = (A71)T. 
Proof. Each assertion is a general consequence of associativity: 
(i). By hypothesis, BA = I, and AB’ = I,. Consequently, 
B=B8I,= B(AB) =(BAB = 1B = 8. 


(ii). If AB = I, and A is invertible, multiplying on the left by A7! 
and on the right by A gives 


In = A-1InA = A71(AB)A = (A 1A)BA = BA. 


If instead B is invertible, J, = BI,B~'! = B(AB)B-! = BA. 

(iii). By definition, an inverse of AB is any matrix C’ such that 
(AB)C = In and C(AB) = In: It suffices to check (AB)(B~!A7!) = In 
and (B~!A~')(AB) = In. By associativity of matrix multiplication, 

(AB)(B1A™) = ((AB)B*)A™! 

= (A(BB'))A"! = (AI,)A7 = AAT = Ih. 


The other equality is similar. 


(iv). By Theorem 1.26, A'(A~!)’ = (A7tA)' = 7) = I, and 
(Ast At = AAS Se. 
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1.2 Systems of Linear Equations 


Definition 1.31. Let A= [A‘] be a real m x n matrix. The equation 
Ax = 0", with # = («!,...,2") unknown, is called a (homogeneous) 
linear system of m equations in n variables. 

If b = (b',...,b™) # 0” is a fixed element of R™, the equation 
Az = 6 is called a non-homogeneous linear system of m equations in 
n unknowns. If there exists an ag such that Aag = b, the system is 
said to be consistent. 

The matrix A is called the coefficient matriz. The m x (n+ 1) block 
matrix [A | 6] is called the augmented matric. 


The zero vector « = 0” always satisfies the homogeneous system. 
The general goal with a homogeneous system is to describe the set 
of solutions, and in particular, to determine whether the system has 
non-trivial solutions (i.e., solutions other than 0”). 

A non-homogeneous system may or may not be consistent. The 
solution set of a consistent non-homogeneous system is closely related 
to the soluton set of the associated homogeneous system, see Theo- 
rem 1.32. 

An algorithm for solving linear systems is presented below; the out- 
put of the algorithm is an equivalent linear system (i.e., having precisely 
the same set of solutions) that can be solved (or shown to be inconsis- 
tent) by inspection. 


Theorem 1.32. Let A be anmxn real matrix and b an element of R™. 
If there exists a solution a9, 1.e., an element of R” satisfying Axo = b, 
then Ax = b if and only if x = a9 + x, for some ap, in R” satisfying 
A&p =0™. 


Proof. By hypothesis, Axp = b. Let a), be an arbitrary element of R”. 
If we write x = 29 + &p, then Ax = b if and only if Ax = Azo, if and 
only if 


0” = Ax — Axo = A(x — 20) = Ag. 


Row Reduction 


Fix an m xX n matrix A and a column 6b in R™. (We allow b = 0”, 
and treat the homogeneous and non-homogeneous cases together.) Let 
A’ denote the ith row of A. The matrix equation Ax = b is equivalent, 
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by equating components, to the system of m scalar equations 
Ala =b!, Ax=b?, ..., A™g = b™, 


We begin by giving precise criteria for a coefficient matrix under 
which a system can be “solved by inspection”. 


Definition 1.33. Let A be a real m x n matrix with rows A’. 

A row A’ has a leading 1 if the row is not zero, and the first. (leftmost) 
non-zero entry is 1. 

The matrix A is in row-echelon form if: 


(i) Every non-zero row of A has a leading 1; 


(ii) Successive leading 1’s occur further to the right. That is, if the 
leading 1’s occur in positions (1,k1), (2,k2), ..., (€,k¢), then the 
column indices increase: ky < kg <--- < kg; 


(iii) Zero rows (if any) are collected at the bottom. 


The matrix A is in reduced row-echelon form if A is in row-echelon 
form, and in addition each leading 1 is the only non-zero entry in its 
column. 


Remark 1.34. Though an m x n matrix A has infinitely many row- 
echelon forms, the reduced row-echelon form of A is unique, see Theo- 
rem 1.42. 


Remark 1.35. If a coefficient matrix is in row-echelon form, the num- 
ber of leading 1’s is at most min(m,n), the smaller of the number of 
variables and the number of equations. 


Definition 1.36. If the leading 1’s in the reduced row-echelon matrix 
of A have positions (1,k1),..., (¢,k¢), the variables x;,, ..., vp, of the 
system Aa = bare said to be basic; the remaining variables are free. 


Remark 1.37. A non-homogeneous system Aa = b is inconsistent (has 
no solution) if and only if the augmented matrix (reduced to row- 
echelon form) has a leading 1 in the last column; such a row corresponds 
to the inconsistent equation 0 = 1. 

When the augmented matrix of a consistent system is in reduced 
row-echelon form, the basic variables are expressed in terms of the free 
variables, and the free variables may take arbitrary real values. 

A homogeneous system has a non-trivial solution if and only if there 
is at least one free variable. Particularly, ifm <n (more variables than 
equations), a homogeneous system Aa = 0 has a non-trivial solution. 
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Elementary Row Operations 


Next we describe a set of transformations that do not change the solu- 
tion set of a linear system, and suffice to bring the augmented matrix 
to reduced row-echelon form. 


I. Add a multiple of one equation to another equation. 
II. Multiply some equation by a non-zero real number. 
III. Swap two equations. 


Symbolically, if i and 2’ denote distinct indices, the preceding oper- 
ations correspond to 


I. Replacing A’ = b' with (At + cA" )x = b' + cb’ for some real c. 
Il. Replacing Ata = b! with cA'a = cb’ for some c ¥ 0. 
Ill. Exchanging A’ = b! and A’g =". 


Proposition 1.38. Each of the preceding operations preserves the set 
of solutions. 


Proof. Operations II. and III. obviously do not change the set of solu- 
tions. To see the same is true for I., note that if Ata = b’ and Ag = be, 
then 

(Ai+cA’)e =b'+ch’ and A’a=bd'; 


that is, every solution of the system before I. is a solution of the system 
after I. Conversely, if 


(Ai + cA ax =b'+cb’ and A’a= bi: 
then adding —c times the second to the first gives 
Aig =b' and A’a = pi : 


that is, every solution of the system after I. is a solution of the system 
before I. Conceptually, each of the three operations is reversible. 


A system of equations Axw = b can be recovered from its augmented 
matrix; there is no need to “carry along” the variables x when calcu- 
lating. Performed on the augmented matrix, the preceding transforma- 
tions are called elementary row operations of type I. (add a multiple of 
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one row to another row), II. (multiply a row by a non-zero number), or 
III. (swap two rows). 

The row-reduction algorithm has two “stages”, each recursive. The 
first stage puts the augmented matrix into row-echelon form; the second 
puts the augmented matrix in reduced row echelon form. 


1.0. Find the leftmost column of coefficients that is not 0’; call this 
the “active” column. 


1.1. Pick a non-zero entry in the active column and use operations of 
type I. or II. to make that entry equal to 1. Then swap this row 
with the first row, so the active column has 1 as its topmost entry. 


1.2. Use operations of type I. to make every entry in the active column 
(except for the topmost) equal to 0. Specifically, if the ith row of 
the active column contains a’, add —a’ times the first row to the 
ith row. 


1.3 Mentally cross out the first row. If there are rows remaining, start 
over at Step 0; otherwise the first stage is complete. 


The algorithm terminates after finitely many steps because in each 
“round”, the matrix being modified has one fewer rows and at least one 
fewer non-zero columns. 

To put a row-echelon matrix into reduced row-echelon form, we need 
only make the entries “above” each leading 1 equal to 0. 


2.1. Find the rightmost leading 1 and use row operations of type I. to 
cancel the non-zero entries “above” it (similarly to I.2). 


2.2. Mentally cross out the last non-zero row. If there are more lead- 
ing 1’s, start over at Step 2.1; otherwise the algorithm is complete. 


Example 1.39. Describe (recalling that superscripts are indices, not 
exponents) the solution set of the non-homogeneous linear system 


301 + 2? + 323 = -11, eas ees ale 
Qa1 + 4x3 = —10, a a le 


The augmented matrix at right is read off by inspection. The row 
reduction algorithm operates on this matrix. 


Round 1 (Step 0) The first column is not 0, so becomes “active”. 
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(Step 1) Our goal is to produce a 1 in the first column with elemen- 
tary row operations. There are at least three strategies: Multiply the 
first row by 1/3; multiply the second row by 1/2; subtract the second 
row from the first. 

As a practical matter, avoiding fractions usually means less work. 
Since the coefficients in the second row are all even, dividing the second 
row by 2 avoids fractions; do this, then swap the rows: 


3 A. 3-2 
2 0 4] -10 


gh [Ok Bs) =aE0 
Tc) De]: 55 


RioR? |1 0 2) —5 
3 1 3]-I11]~ 


(Step 2) Subtract multiples of the first row from the remaining 
row(s) to “clear” the active column: 


R?-3R} F 0 al al 


LQ) 9 
0 1-3) 4|- 


6 ae Hes [ete 


(Step 3) Mentally cross out the first row, look at the remaining 
system of (m — 1) = 1 equation(s) in (n — 1) = 2 variables, and start 
over: 

e’—32°=4, [0 1 -3] 4]. 


Round 2 (Step 0) The (new) first non-zero column becomes active. 
(Step 1) The “active” coefficient is already 1, and there are no ad- 
ditional rows, so the first stage is complete. In fact, the augmented 
matrix is already in reduced row echelon form, so nothing needs to be 
done for the second stage. 
We have shown the original system is equivalent to (has the same 
set of solutions as) 


vt +22% =—5, £06 Bes 
c= 30° =A: Ot 3 


The basic variables are x! and x?, while x® is free. To describe an arbi- 
trary solution, write each basic variable in terms of the free variable(s): 


1 3 
pe ee Qa, i =o me —5 : =2 
> f i 44+ 32°] = 4| +2 3 
a =44 32 3 a a 0 1 


On the right, the general solution of the original system is in “paramet- 
ric form”. To each real value of the “parameter” x? there corresponds 
a solution a of the original system. 
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Remark 1.40. In the preceding example, xp = (—5,4,0) is a “particu- 
lar” solution, and every scalar multiple of (—2,3,1) is a homogeneous 
solution a, as you should check, compare Theorem 1.32. 


Example 1.41. Find the general solution of 
321 + 12%? a> + 24 + 32° =7, 
2x1 8x7 a? +24 P= 5, 
x} Ag? +323 + ¢4-— 2? = 5, 


—g!— Ax? + 323 — 97° = 1, 


oe AG eal 1 43 1 -11]5 
Do Serine shih Rar) 2 8 ir 115 
1 43 1 -1/5 D1 ALD de BF 
—-1 -4 3 0 -2/1 —-1 -4 3 0 -2/1 
RoR 
R_3R1,|1 4 3 1 -l 5 R?+R*,/1 4 3 1 —-11]5 
RU+R! 00 -5 -1 3] -5 ae Oia: a" 0. <Q. I 
0 0 -8 -2 6|-8 0041 -3)4 
00 6 1 -3)] 6 006 1 -—3/6 
re—_aR?,{1 4 3 1 —-1]5 Leth Be db > Sduilhs 
RY-6R* Oe as co. “SOM cL Rik OOF Oe! “Oui aT (*) 
000 1 -—3/0 000 1 -—3/0 
0001 -3/0 Ot 00.20: 0s GG 
hae: ee ae 0 LAO Os- “=D o 
Rik OO 202) aL RI=3R° Oo Oe - hia 
0001 -3/0 000 1 -—3/0 
0000 O|0 000 0 £O;]0 


The starred matrix is in row-echelon form, with leading 1’s in positions 
(1,1), (2,3), and (3,4). The basic variables are therefore x!, 2°, and «4; 
the remaining variables, x? and x°, are free. To parametrize the solution 
space, write 7! = 2 — 47? — 22°, «3 = 1, z+ = 32°, so 


x} 2 — Ax? — 2r° 2 =A =) 
x x2 0 1 0 
2} = 1 Weegee ||| nid neta lll a 
x 3x? 0 0 3 
x? xv? 0 0 1 
eed 
70) Lh 
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Here again, #p = (2,0, 1,0,0) is a particular solution, and the remaining 
portion is the general homogeneous solution. 


Uniqueness of Reduced Row-Echelon Form 


Theorem 1.42. Let A be anm xn matrix. If A, and Al are matrices 
obtained by putting A into reduced row-echelon form, then Ai = Ab. 


Proof.* If anm x (k+1) matrix A is in reduced row-echelon form, then 
the m x k matrix B obtained by removing the (k + 1)th column is in 
reduced row-echelon form: Removing the last column cannot create a 
leading 1 or remove a row of 0s. 

We proceed by induction on the number of columns. If A has one 
column, the theorem is obvious. 

Assume inductively for some k > 1 that every m x k matrix has a 
unique reduced row-echelon form. Let A be m x (k + 1), and suppose 
A‘, and Aj are reduced row-echelon forms of A. Let B, B; and Bs be 
the m x k matrices obtained by removing the (k+1)th column of A, Aj 
and Aj, respectively. 

Performing a sequence of row operations and then removing the 
(A. + 1)th column has the same effect as removing the (/ + 1)th column 
and performing the same sequence of row operations. Consequently, 
B‘, and Bé are reduced row-echelon forms of B, hence are equal by the 
inductive hypothesis. It suffices to prove A, and A) have the same 
(Kk + 1)th column. 

Because A‘, and A are obtained from A by elementary row opera- 
tions, the homogeneous systems Aja = 0, and ASa = 0 have the same 
solution sets. Let ¢ be the number of leading 1’s in Bj, i.e., the number 
of non-zero rows after deleting the (& + 1)th column of A}. 

If Aj has a leading 1 in the (k + 1)th column, then the (¢ + 1)th 
row of Ala = 0 reads x*+! = 0. Since Aha = 0 has the same solution 
set, AS also has a leading 1 in the (k + 1)th column, necessarily in 
the (€+ 1)th row. Since each leading 1 is the only non-zero entry in 
its column, A = AS. (This is the only step where the hypothesis of 
reduced row-echelon form is used.) 

If instead A does not have a leading 1 in the (k + 1)th column, 
then there exists an a = (x!,...,x*,1) satisfying A/a = 0 = Aba, and 
therefore (A, — A5)a = 0. However, the first k columns of Aj — A} 


*Adapted from Thomas Yuster, The Reduced Row Echelon Form of a Matriz is 
Unique: A Simple Proof, Mathematics Magazine, 57, 2 (1984), 93-94. 
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are zero, so O™ = (A, — A5)a is the (&k + 1)th column of Aj — Aj, and 
again A) = Ab. 

In either case, we have shown that a general m x (k + 1) matrix A 
has a unique reduced row-echelon form. This establishes the inductive 
step, and completes the proof. 


1.3 Criteria for Invertibility 


The basic properties of invertibility “at the level of matrices” are given 
by Theorem 1.30. In practice, we need algorithmic criteria for detecting 
invertibility of specific matrices, and methods for calculating inverses of 
invertible matrices. These algorithms necessarily operate “at the level 
of entries”. 

A 1x1 matrix is essentially a real number, A = [A]], and is invertible 
if the single entry is non-zero, in which case A~! = [1/At]. A general 
2 x 2 matrix admits a similar criterion for invertibility and a formula 
for the inverse: 


Proposition 1.43. A 2 x 2 matriz A = [A‘] is invertible if and only 
if the quantity A = AjA3 — A7A} is non-zero, in which case 


Proof. The rows are proportional if and only if A = 0, if and only if the 
row-echelon form has a row of zeros, if and only if A is not invertible. 
The formula is proven by verifying AA~! = Jy and A7!A = Ip. 


Remark 1.44. This formula for the inverse of a 2 x 2 matrix should be 
memorized. It may help to write 


fa b = ee d —b 
a=[f i, 2 =a | sk 


The quantity A, the determinant of A, is studied in Chapter 3. 


There are analogous formulas for larger matrices, but they are pro- 
hibitively complicated for practical use. Instead we develop a row- 
reduction algorithm for inverting an n x n matrix (or detecting that 
the matrix is not invertible). 
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Elementary Matrices 


Definition 1.45. Let A be an m x n matrix. An m x m matrix E is 
an elementary matrix if the product EF'A gives the result of performing 
an elementary row operation on A. 


We will show by construction that elementary matrices exist for each 
type of row operation. The following “principle” is useful for writing 
down elementary matrices in practice. The proof is immediate from 
the definition by taking A = Ij. 


Proposition 1.46. An elementary matrix is obtained by performing the 
corresponding row operation on the identity matrix Im, and is invertible. 


Example 1.47. The following 3 x 3 matrices (i) add c times the third 
row to the second (R? + cR3); (ii) multiply the second row by ¢ (cR?); 
and (iii) swap the second and third rows (R? © R°). 


LO. 1 0 0 Ae, 
OE ve), Oe 0k)" Oo Oy lL 
0 0 1 0-0. 1 0 1 0 


To see these matrices have the expected effect, multiply by a general 
matrix A, expressed as a block matrix whose entries are rows, €.g., 


1 0 0] JA! A! 1 0 0] JA! A! 
0 1 c| |A?| = |A2%+cA3| ; 00 1] |A2} = | A 
00 1| {A A? 0 1 oO} |A3 A? 


To construct elementary matrices for each type of row operation, 
let (e;,)%_, and (e*)%, be the standard basis of R™ and the standard 
dual basis of (R”)*. 

If i and j are indices between 1 and m, recall that e} = e;e/ is the 
m Xx m matrix with a 1 in the (i,7) entry and 0’s elsewhere. 


Remark 1.48. The product e/ A is the jth row of A, and eA =ejeA 
is the m x n matrix whose 7th row is the jth row of A, and whose other 
entries are zero. 


I. If c is real, then EF = I, + ce} implements the type I. elementary 
row operation Ri+ cRI: If A is an arbitrary m x n matrix, then 
(Im +ce})A = A+ce?A is the result of adding c times the jth row 
of A to the ith row of A. 
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The diagonal entries Ek are all 1, the (7,7) entry is c, and all other 
entries of F are 0. The inverse operation is to subtract c times the 
ith row from the jth row: E~' = (Im +ce?)~! = Im — ce}. Since 


(e1,)(e%) = 0%" the inverse relationship 


(Im + ce?)(Im — ce!) = Im 


i 
is a difference of squares factorization, compare Exercise 1.4. 


Il. Ifc #0 is real, then F = In, + (c — let implements the type II. 
elementary row operation cR': (Im + (c — 1)e%)A is the result of 
multiplying the ith row of A by c. 


The matrix FE is diagonal, with E} = c and all other diagonal en- 
tries BE equal to 1. The inverse operation i Ri is to multiply the 
ith row by l/c. 


Il. Ift Aj, then F = Ip, —(e:— e;)(e! —e/) implements the Type III. 
elementary row operation R’ «+ RJ: EA is the result of swapping 
the 7th and jth rows of A. 


If 7 = (7 7) is the transposition exchanging 7 and j, then EF = (or); 


the (k,k) entry is 1 if k #7 and k ¥ j, while the (7,7) and (j,7) 
entries are 1, and all other entries are 0. This matrix is its own 
inverse. 


An Algorithm for Matrix Inversion 


A sequence of row operations reducing a matrix A to echelon form A’ 

may be viewed as a factorization A’ = E,E,_ ...E A, or after rear- 

ranging, A = ae oe Boe A’ of A into a product of elementary 

matrices and a single row-echelon matrix. This observation leads to a 

criterion for invertibility, and to an algorithm for computing A7!. 

Theorem 1.49. Jf A is anm xm matrix, the following are equivalent: 
(i) A can be row-reduced to the identity matrix Ty. 


(ii) A is invertible. 


Proof. If A can be row-reduced to the identity matrix, then A is a 
product of elementary matrices, and hence invertible. 
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Inversely, if A cannot be reduced to the identity matrix, then the 
reduced row-echelon form of A has fewer than m leading 1’s, and there- 
fore has a row of 0’s. It follows that the homogeneous system Aa = 0” 
has a non-trivial solution ag. If B is an arbitrary m x m matrix, then 
(BA)ao = B(Aag) = BO™ = 0” F£ ao, so BA F Im. That is, A has 


no inverse. 


The algorithm for computing A~! uses this result. First form the 
m Xx (2m) matrix [A | Im] by “augmenting” with the identity matrix. 
Now row reduce the left-hand block, applying the row operations all the 
way across. If the left-hand block is row-reduced to I, the right-hand 
block is A~!; if the left-hand block acquires a row of 0’s, then A is not 
invertible. 

This works because a sequence of row operations reducing A to the 
identity may be viewed as a factorization 


jc (AL in| = nl Ble 


Equality of the right-hand block says B = E,E,_,...E\; equality of 
the left-hand block says BA = I,,. But B is a product of elementary 
matrices, and therefore invertible. By Theorem 1.30 (ii), AB = Im, so 
Bea 


Exercises 


Exercise 1.1. There are five elementary row operations on 2 x 2 matri- 
ces. List them, and for each, write down the corresponding elementary 
matrix E and its inverse E~', and multiply matrices to check that 
EE! = 1g. (Four of the “E”s will contain an arbitrary constant c.) 


Exercise 1.2. Write down the 4 x 4 elementary matrices for the indi- 
cated operations. 


(a) Adding c times the first or third row to the second row, or to the 
fourth row. (Four matrices in all.) 


(b) Exchanging the first, second, or third row with the fourth row. 


Exercise 1.3. Use the row-reduction algorithm to derive the formula 
for the inverse of a 2 x 2 matrix. (For simplicity, you may use the 
notation of Remark 1.44, and assume a # 0.) 
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Exercise 1.4. If 7, 7, k, and @ are indices between 1 and m, show that 


L oT 
debahel= tom gee 
eee i a ey 


Referring to Example 1.24, use this formula to compute AB and BA, 
noting that A = el and B= ef. 
Suggestion: Use the results of Examples 1.17 and 1.18. 


Exercise 1.5. Suppose A = [A‘] is an m X n matrix, x = [2] is an 
ordered n-tuple of “inputs”, and y = Ax = [y’] is the corresponding 
ordered m-tuple of “outputs” upon multiplication by A. 

Show that the entry Ai is the “coefficient of sensitivity of y’ with 
respect to zJ”, in the sense that if the jth input 2’ is incremented by 
an amount Az! (and the remaining inputs are held fixed), then the 
ith output y’ changes by Ay’ = At Ag). 


Exercise 1.6. Let A be m x p and B be p x n. 


(a) If the ith row of A is 0, what (if anything) can be deduced about 
the product AB? What if the jth column of A is 0? 


(b) If the ith row of B is 0, what (if anything) can be deduced about 
the product AB? What if the jth column of B is 0? 


Exercise 1.7. If A = )); ale! and B = d, be! are n X n diago- 
nal matrices, prove AB = BA. (Part of the exercise is to cope with 
reindexing, but be sure you understand this result conceptually.) 


Exercise 1.8. Ann Xn matrix A is symmetric if A' = A. Prove that 
if A and B are symmetric n x n matrices, then AB is symmetric if and 
only if BA = AB. 

Hint: Use Theorem 1.26. 


Exercise 1.9. If A = [A*] and B = [B3] are n X n matrices, their 
commutator is defined to be |A, B] = AB—BA. In particular, A and B 
commute if and only if [A, B] = 0”"*”. 
(a) Show that the (7,7) entry of [A, B] is 
rd i . . 
[A, B]; = "(AL BF — AP Bj). 
k=1 


AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31 


CHAPTER 1. MATRIX ALGEBRA 21 


(b) Show that if A, B;, and Bg are n x n matrices and c is real, then 
[A, cB, + Bo} = c[A, Bil + [A, Bol. 
Suggestion: Work at the level of matrices, not at the level of entries. 


Exercise 1.10. Let A = [A‘] be an n x n matrix, and ef = e,e* the 

n Xn matrix with a 1 in the (k, @) entry and 0’s elsewhere. 

(a) Show that [A, e,] = 0" if and only if Aj, = Ajdj, for all i and j. 
Hint: The (7,7) entry of ef, is 5,05. 

(b) Show that AB = BA for every B in R”*” if and only if A is a 
scalar matrix, i.e., there exists a real number c such that A = cn. 


Hint: One direction is easy. For the converse, use part (a) and the 
fact that A commutes with every et. 


Exercise 1.11. Let A= [A‘] be an m x n matrix, and let A; denote 
the jth column. 


(a) Write down three types of of “elementary column operation” anal- 


ogous to elementary row operations. 


(b) For each type of column operation, find an n x n matrix EF’ such 
that AE is the result of performing that operation on A. 


(c) Does there exist an m x m matrix EF such that EA implements an 


elementary column operation? Explain. 


Exercise 1.12. Let n > 1 be an integer. Show that the set GL(n, R) of 
invertible, n x n real matrices is a group under matrix multiplication.* 
Suggestion Use Theorem 1.30. 


Exercise 1.13. Let o be a bijection of the set {1,2,...,n}, ie, a 
permutation. The matrix 


whose (i, 7) entry is 


i _ cot) _ Jl ifj =o(2), 
On 0; -| 


*The notation GL(n, R) stands for the general linear group of size n, with real 
entries. 
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and which therefore has precisely one 1 in every row and column, is the 
associated permutation matrix. 


(a) Write out the two 2 x 2 permutations and the associated matrices, 
and the six 3 x 3 permutations and associated matrices. 


b) If @ = SD, xe, show that Ppa = St, 27 e,. Verify explicitly 
J j J 
(by multiplying matrices) for the six 3 x 3 permutation matrices. 


(c) Prove that P, is invertible, and that (P,)~! = P,-1 = pr. 


(d) Prove that if o and 7 are permutations, then P-P, = P;,. (That 
is, the mapping 0 +> P, is an injective group homomomorphism 
from the symmetric group to GL(n, R).) 

Exercise 1.14. Let A = [A‘] be 2 x 2, b= (b1, b*), x = (z', x”), and 

0 = 0!*?. Evaluate the product 

Al AS <6) Tt 
42) -eae 
0 O 1 1 ee 


in two ways: (i) as an ordinary product, and (ii) as a “product of block 
matrices”. Show the results are compatible in an obvious sense. 
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Vector Spaces 


2.1 Vector Space Axioms 


Definition 2.1. A real vector space (V,+,-) comprises a non-empty 
set V, a binary operation + : V x V > V, and a scalar multiplication 
map-:Rx V —V satisfying, for all v1, v2, and v3 in V and all real 
numbers cj, C2, 


(i) (Associativity of +) (v1 + v2) + v3 = v1 + (v2 + V3); 
(ii) (Commutativity of +) vg+ v1 = v1 + V9; 
(iii) (Identity element) There exists a zero vector 0 in V such that 


O+v=v=v+4+0 for all v in V; 
(iv) (Additive inverses) For every v in V, there exists an element —v 
such that v + (—v) =0 = (-v)+v. 

(v) (“Associativity” of -) (cyc2)-v = c1- (C2: v); 
(vi) (Left distributivity) (c; + c2)-v=c1-v+c2:V; 
(vii) (Right distributivity) c- (v1 + v2) =c- ur +c: 9; 
(viii) (Normalization) 1-v = v. 

Remark 2.2. The pair (V,+) is an Abelian group, on which the real 
numbers “act” by multiplication. Generally, scalars may (for example) 
be complex numbers, and one might speak of a complex vector space. In 
this book, vector spaces are real unless the contrary is stated explicitly. 


23 
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Remark 2.3. The operations + and - are part of the definition of a vector 
space. However, it is often convenient (and not ambiguous) to speak of 
“a vector space V”, suppressing explicit mention of the operations. 


Remark 2.4. For emphasis, or when considering more than one vector 
space, we may write 0” to signify the zero vector of V. 


Proposition 2.5. Let (V,+,-) be a vector space. For every v in V: 
(i) 0-v=0; 
(ii) (-1)-v=-—v. 
Proof. Since 0+ 0 = 0, the left distributive law gives 
0-v+0-v=(04+0)-v=0-v. 


Canceling 0- v gives 0-v = 0. 
Now, since 0 = 1+ (—1), the left distributive law gives 


0=0-v=(1+(-1))-v=1-v+(-1)-». 


But 1-v = v by normalization, so 0 = v + (—1)- v. By definition of 
additive inverses, —v = (—1)-v. 


Remark 2.6. The scalar multiplication dot is often omitted in practice. 
When calculating, we freely use lv = v, 0v = 0, and (—1)v = —v. 


Function Spaces 


Example 2.7. Let X be a non-empty set. The set F(X,R) of all 
real-valued functions on X becomes a vector space under “pointwise 
addition and scalar multiplication”, i.e., 


(ft+9)(x)=f(@)+9(2), — (ef)(x) =e(f(e)). 


The zero vector is the function whose value at each point is 0; the 
additive inverse of a function f is the function (—f)(x) = —f(x). The 
axioms are immediate; a function f with domain X is an “X-tuple” of 
real numbers, i.e., a collection of real numbers f(a) “indexed by” the 
points of X, and each vector space axiom corresponds to a property of 
addition or multiplication of real numbers. 
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For illustration, here is a proof of the right distributive law. If 
fi and f2 are arbitrary functions on X and c is a real number, then the 


value of c(fi + fo) at x is 


le(fi + f2)| (@) = e[(fi + fa)(x)] 

= c[fi(x) + fo(x)| 
efi (x) + efo(z) 
= (cf1)(x) + (cf2)(x) 


Defn. of scalar multiplication 
Defn. of vector addition 

Distrib. law for real numbers 
Defn. of scalar multiplication 


Defn. of vector addition. 


[(cf1) + (cf2)] (x) 


Since the functions c( fi + fz) and (cf1) + (cf2) have the same value at 
every x in X, they are the same function. 


Example 2.8. If N = {0,1,2,3,...} denotes the set of natural num- 
bers, we write F(N,R) = RY. An element of R” is effectively an 
infinite ordered list of real numbers (a,)?29, i-e., a real sequence. 


Column Vectors and Matrices 


Example 2.9. Let n be a positive integer, XY = {1,2,...,n} aset with 
n elements. Because a function f : X — R is essentially an ordered 
n-tuple of real numbers f(1), ..., f(n), we denote F(X,R) by R”. 

We write elements of R” as “column vectors”. The vector space 
operations are defined componentwise: 


i y x+y x CL 
Elke ahesibisge dl pe CN aed ie Uilles 
gn y” eh + y” x” er” 

or simply [a7] + [y’] = [a + y’], c[x’] = [ex] for brevity. The zero 


vector of R” is denoted 0”. 


Remark 2.10. The superscripts denote row indices, not exponents. 


Remark 2.11. When we denote an element of R” by [x], “j” is a 
dummy index, having no meaning outside the brackets. We may use 
any convenient letter without changing the meaning: [2"] = [x] = [x 
etc. 

When we speak of a component 2/, by contrast, 7 has a specific 
(possibly implicit) numerical value between 1 and n. 


’ 
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Example 2.12. If X = {1,2,...,m}and Y = {1,2,...,n}, the Carte 
sian product X x Y consists of all ordered pairs (1,7) with 1 <i<m 
and 1<j <n. A real-valued function f : X x Y > R is essentially a 
doubly-indexed family of numbers f(i,7) = Ai, namely, a real m x n 
matrix: That is; AX x YR) = Re. 

Addition and scalar multiplication of functions correspond to the 
“entrywise” operations of matrix addition and scalar multiplication, so 
(R™*" +,-) is a vector space. 

The zero matrix 0™*”" in R™*" is the identity element for addition. 
The negative of a matrix A = [A‘] is —-A= [—A‘]. 


Polynomials 


Example 2.13. Let n be a non-negative integer, t an indeterminate, 
and let t” denote the nth power of t. If ag, a1, ..., Gy are real numbers, 
the expression 


p(t) = ao Fat - aot? potest ant” 


is called a polynomial with coefficients ag, ..., Gn. The degree of p is 
the largest index with non-zero coefficient. If all coefficients are 0, we 
define the degree to be —oo. 

We define polynomial addition and scalar multiplication by adding 
or multiplying coefficients. That is, if g(t) = bp + bit + --- + b,t”, and 
if c is real, we define 


(p + q)(t) = (ao + b0) + (ao + b0)t + +++ + (Gn + bn)t”, 
(cp)(t) = cag + cayt + cagt? +++» + cant”. 


The set of all polynomials of degree at most n is denoted P,. It is 
straightforward (if somewhat tedious) to check all the vector space ax- 
ioms. When we study “subspaces”, we will see that only two conditions 
require verification, and these are encoded by the fact that a sum or 
scalar multiple of polynomials is itself a polynomial. 


Example 2.14. If ao, a1, ..., @n and bj, ..., bn are real numbers, the 
expressions 

C(t) = ap + ay cost + ag cos(2t) +... + Gp cos(nt), 

S(é) = by sint + bgsin(2t) +... +b, sin(nt), 
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are called the cosine polynomial with coefficients ag, ..., Gn and the 
sine polynomial with coefficients bi, ..., bn. 

Two cosine polynomials are added, and a cosine polynomial is mul- 
tiplied by a scalar, as if they were functions on X = R. The set of 
cosine polynomials turns out to be a vector space under these opera- 
tions; the proof boils down to the fact that a sum of cosine polynomials 
is a cosine polynomial, and a scalar multiple of a cosine polynomial is 
a cosine polynomial. 

Corresponding remarks for sine polynomials are true. 


The Geometry of Vector Operations 


The vector space (R?,+,-) may be viewed as the Cartesian plane, a 
plane equipped with a distinguished pair of perpendicular lines (the 
coordinate axes) meeting at the origin. A point of R? is an ordered 
pair « = (x!,a7), and is located by treating x! as a horizontal (east- 
west) position and x? as a vertical (north-south) position. The origin 


is the zero vector, 0 = (0,0). 


Figure 2.1: Vector addition and scalar multiplication in R?. 


As a vector, « may be viewed as an arrow with tail at 0 and tip at a. 
A vector sum x+y is visualized by the parallelogram law: construct the 
parallelogram with a vertex at 0 and having sides x and y; the fourth 
corner is 2+ y. Scalar multiplication ca is visualized by “scaling” a 
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by a factor of c. 

Example 2.15. In Figure 2.1, x = (2,1) and y = (—1, 5) are plotted 
on a Cartesian grid of unit squares, along with several sums of scalar 
multiples. For example, x + y = (1, 3) and # + 2y = (2,0). 


Example 2.16. Analogous pictures can be drawn in more abstract 
vector spaces. For example, let V = F(R, R) be the set of real-valued 
functions on R, regarded as a vector space with ordinary addition 
and scalar multiplication of functions. The functions f(x) = cos? x, 
fo(x) = sin? x, gi(x) = 1, and go(x) = cos(2x) may each be viewed as 
an arrow, and any two of these functions “span a plane’. 

The trigonometric identities 


; ay) 1 
cos? x + sin? x = ile sil f 
cos? x — sin? x = cos(22) 
; : cos? 
may be interpreted schematically as 
vector sums. In particular, all four 
functions lie in a plane in V, since +2 
p , —sin* x cos(22) 


gi = fit fo and 92= fi — fo. 
The identities cos? x = 5(1 + cos 2x) and sin? « = 5(1 — cos 27) may 
also be interpreted using this diagram. (Check the details yourself.) 


Linear Combinations 


Definition 2.17. Let (V,+,-) be a vector space, and v1, V2, ..., Up 
be distinct elements of V. If x!, 2?, ..., 2? are real numbers, the 
expression 
Pp 
gly, +2209 +---4 vy = S— x) v;, 
j=l 


representing an element of V, is called the linear combination of the v; 
with coefficients x4. We say the linear combination is non-trivial if at 
least one coefficient x/ is non-zero. 


Remark 2.18. If a coefficient is zero, the corresponding summand may 
as well be omitted. With this convention, a “non-trivial” linear com- 
bination may be regarded as a “non-empty” linear combination, i.e., 
as having at least one summand. An empty linear combination has 
value OV. 
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Example 2.19. If x = [x] € R”, and (e;)7_1 denotes the standard 
basis of R” (Definition 1.7), then 


x) 1 0 0 

ee 5, x 0 1 0 

a= ya ey, i.e., | eo (oe eae eee, 
7a - . . : 

g” 0 0 1 


In words, every column a in R” is assembled as a linear combination 
from the standard basis using the components of a as coefficients. 

Multiplying by a standard dual basis element e* selects the kth com- 
ponent. By Theorem 1.19, 


n n n 
e*x — ek (yo) = y were; a So xl df = z*. 
j=l j=l j=l 


(Note the final step; summing xl dr over j selects the term j = k.) 


Example 2.20. Every row = [¢;] in (R”)* may be written as a linear 
combination 5°, é;e' of the standard dual basis (e’)?"_,. 

Example 2.21. If A=[A, ... Ap] is an m x n matrix (partitioned 
into n columns, each an element of R™) and # = [x] is an element 
of R”, then the matrix product Aw = \> fi x) A;, an element of R™, is 
precisely the linear combination of the columns of A with coefficients x. 
Example 2.22. Recall that the product e! = eje! in R™*” has a 1 in 
the (7,7) entry and 0’s elsewhere. If A = [A‘] is an arbitrary element 
of R™*”, then 


m n m n ; 
A= SOS Ae’ = OY Ale! 


i=1 j=1 i=1 j=l 


n n 
compare Example 2.19. In particular, I, = S- eje’ = S- et in Re 
i=l i=l 
Remark 2.23. A computational proof of the preceding claim illustrates 
a couple of algebraic idioms. Let A’ denote the sum on the right. Since 
j is a dummy index in the expression for A’, we may call it k. 


Now, if 7 is an arbitrary index, 7 = 1,...,n, then 
m n . m n 7 m : 
A'e; = (>: S- Ae] ej = S° S¢ Aj.ei5; = S> Aje; = Ae;. 
i=1 k=1 w=1 k=1 i=1 


By Corollary 1.23, A’ = A. 
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2.2 Subspaces 


Definition 2.24. Let (V,+,-) be a vector space, and let W CV bea 
non-empty subset. 

Ifa+y € W for all # and y in W, then W is closed under addition. 

If ca € W for all x in W and all real c, then W is closed under 
scalar multiplication. 

If (W,+,-) is a vector space, we say W is a (vector) subspace of V. 


Remark 2.25. If a non-empty set W in a vector space (V, +, -) is closed 
under scalar multiplication, then OY € W: By hypothesis there is 
some x in W, so 0a = OY € W. Further, W contains the additive 
inverse of each of its elements: If « € W, then (—1)-a% =—a € W. 


Theorem 2.26. Let (V,+,-) be a vector space, W C V non-empty. 
The following are equivalent: 


(i) W is closed under addition and under scalar multiplication. 
(ii) For alla andy in W and all real c, ca+y EW. 
(iii) W is a subspace of (V,+,-). 


Proof. ((i) if and only if (ii)). Assume W is closed under addition and 
scalar multiplication. If x and y are elements of W and c is real, then 
ca € W since W is closed under scalar multiplication, so ca + y © W 
since W is closed under addition. 

Conversely, if Condition (ii) holds, then taking c = 1 shows W is 
closed under addition, while taking y = 0 shows W is closed under 
scalar multiplication. 


((i) if and only if (iii)). If W is closed under addition in V, then the 
addition operator + is a binary operation on W itself, and is, a fortiori, 
associative and commutative on W. 

If W is closed under scalar multiplication, W contains the identity 
element of V as well as the additive inverse of each of its elements, 
by Remark 2.25. That is, the first four vector space axioms hold for 
(W,+,-). The last four axioms hold automatically. 

Conversely, if W is a subspace, then (W,+,-) is a vector space, so 
by definition W is closed under addition and scalar multiplication. 


Remark 2.27. If (V,+,-) is a vector space, W C V is a subspace, and 
if Yj, ..., Um are elements of W, then an arbitrary linear combination 
of the vj; is in W, by induction on the number of summands. 
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Example 2.28. Every vector space has two subspaces: The entire 
space (which is “not proper”), and the trivial subspace W = {0}. A 
subspace other than these is “proper and non-trivial” . 


Example 2.29. Consider the vector space (R?,+,-), and write a gen- 
eral element « as (x!, x). 

The set W, = {a : x’ > 0}, the closed first quadrant, is closed under 
addition, since if # and y are in Wj, then 2+ y = (x! + y!,a? 4+ y?) 
has non-negative components. (A sum of non-negative real numbers is 
non-negative.) This set is not closed under scalar multiplication. For 
example, (1,1) € W1, but —1(1,1) = (—1, —1) is not in Wj. 

The set W2 = {a : a'x? = 0}, the union of the coordinate axes, 
is closed under scalar multiplication. If a € W 2 and c is real, then 


cx = (cx', cx”) satisfies the membership criterion for W2, since 


(cx!) (cx”) = c?(x'x”) = c?(0) = 0. 


This set is not closed under addition: # = (1,0) and y = (0,1) are 
in Wo, but their sum a + y = (1, 1) is not in Wo. 


r+y¢W, eee 
2 eW, y € We 4 
—z ¢éW, 
xe WwW, rew, 
—x2 ¢W, ety dW, 
yew, 


The set W3 = {a : 5a! — 3x? = 0} is closed under addition and 
scalar multiplication. To prove this, let x and y be arbitrary elements 
of W3, and let c be real. We wish to show that ca + y satisfies the 
defining equation of W3. But 


5(ca! + y') — 3(cx? + y*) = c(5x! — 37) + (Sy! — 3y7) =0+0=0, 


soce + y € W3. 

The set W4 = {a : 5a'—3a? = 1} is not closed under either addition 
or scalar multiplication. We give two proofs of each assertion. First, it 
suffices to find counterexamples. If we take x = (, 0) and y = (0, —), 
then x, y € Wa, but x+y= (, —}) does not satisfy 52! — 3a? = 1, 


soa+y ¢ W4. 
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For scalar multiplication, the zero vector is not in W4, so as re- 
marked earlier, W4 is not closed under scalar multiplication. 

Alternatively, if a and y satisfy the defining equation of W4, we 
may add these equations, or multiply one of them by c: 


1 = 52! — 32”, 1 = 5a! — 32”, 
1 = 5y! — 3y”, c= c(5a! — 3x7) 
2=5(2+ + y') — 3(2? +”), = 5(cr!) — 3(cx?). 


The left-hand side of the sum is 2 1, so the pair (x! + y!, 2? + y?) is 
not in Wy. The left-hand side of the product is c, so if c ¥ 1, the pair 
(cx, cx”) is not in W4. 


Example 2.30. If a; and ag are real numbers (possibly 0), the set W = 

{x : ayv! + agv? = 0} C R? is a subspace. Conceptually, the defining 

equation is “preserved by addition and by scalar multiplication”. 
Generally, if a1, a2, ..., Gn are real numbers, the set 


fax: aya! eS agg eee ip. =UPCR” 


is a subspace. This is easily verified directly, see Theorem 2.37. 


Definition 2.31. Let n be a positive integer. A square matrix A 
in R"*” is said to be symmetric if A' = A, and is skew-symmetric if 
Al=—A. 


Example 2.32. Let V = R””*” be the set of square matrices, regarded 
as a vector space under matrix addition and scalar multiplication. The 
set Sym” of symmetric matrices in V is a subspace: If A and B are 
symmetric, and if c is real, then 


(cA+ B)'=cA'+B'=cA+B 


by Remark 1.15. Theorem 2.26 implies Sym” is a subspace of V. Sim- 
ilarly, the set Skew” of skew-symmetric matrices in V is a subspace. 


Example 2.33. Let V = R?*? under addition and scalar multiplica- 
tion. 

The set C of matrices [A‘] satisfying At = A3 and A? = —A} has 
typical element 


i 2 
a —a et 1 0 2 0 -l 4 2 
ie i} =« r if +a f 0 SO heeds 
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where the final equality is the definition of J. If A = a'I + a?J and 
B=0b'I+0?J are in C, and if c is real, then 


cA+ B=calI+a7J) + (BI 4+B) = (cat + b')I + (ca? +B?) J 


is in C. By Theorem 2.26, C is a subspace of V. 
The set W of matrices satisfying At A} — A7A} = 0 is closed under 
scalar multiplication, but not under addition. For example, if 


1 0 0 0 1 0 
t=[p of Bafa) 4*8=[0 a] 


then A and B are in W, but A+ B is not. 


Example 2.34. Let V = F(R,R), viewed as a vector space under 
ordinary addition and scalar multiplication of functions. The following 
subsets are vector subspaces, as is easily checked using Theorem 2.26. 

Z(0) ={f : f(0) = 0}, the set of functions vanishing at 0; generally, 
Z(xo) = {f : f(vo) =0} if zo is real. 

C(R) = {f : f is continuous}, D(R) = {f : f is differentiable}, 
c!(R) = {f : f is continuously differentiable}. In each case, a theorem 
from analysis shows that a sum or scalar multiple of functions of the 
stated type is another function of the same type. 

Py, the set of polynomial functions of degree at most n, and P, the 
space of all polynomial functions of arbitrary (finite) degree. 

The space of all cosine polynomials, or of all sine polynomials. 

{finc!: f’ = f}. This is an example of a solution space of a 
differential equation. 


Theorem 2.35. Let V be a vector space, W, and W2 subspaces of V. 
(i) The intersection W, 1 W2 is a subspace of V. 


(ii) The union W, UW, is a subspace of V if and only if W1 C We or 
Wo. CW. 


Proof. (i). Let x and y be elements of WiMW4, and let c be real. Since 
x and y are elements of W; and W, is a subspace, ca + y € W;. An 
entirely similar argument shows ca+y € Wo. Thus ca+y € Wy NWs. 
Since a, y, and c were arbitrary, W; N W2 is a subspace. 

(ii). If Wy C Wo, then W, U W2 = Wo is a subspace. If Wo C Wj, 
then W1 U W2 = W, is a subspace. 
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Figure 2.2: The intersection and union of subspaces. 


Inversely, suppose neither subspace is contained in the other. Pick 
xz, in W, \ Wo, and pick a2 in W2 \ Wy. Since each a; is in W, U Wo, 
it suffices to show a1 + 2% is not in W; UW». But if 2; + a2 were an 
element of W1, then a2 = (a1 + #2) — x; would be an element of Wj, 
contrary to the choice of a2. Similarly, if #1+a2 were an element of Wo, 
then a, would be an element of W2, contrary to choice. This means 
x, +2 is not in W,UW4, so this set is not closed under addition. (The 
union is closed under scalar multiplication, as you can check.) 


Remark 2.36. An intersection of an arbitrary family of subspaces of V 
is a subspace of V, by an obvious modification of the proof of (i). 


Theorem 2.37. If A= [A‘] isanm xn matrix, then 

W = {[x9] in R”: Aja’ +---+ Abe" =0 for alli =1,...,m} 
is a subspace of (R”,+,-). 

We give two proofs. 


Matrix multiplication. By the definition of matrix multiplication, the 
set W is the set of in R” such that Ax = O”. If a and y are ele- 
ments of W, and if c is real, then by properties of matrix multiplication 
(Theorem 1.19), 


A(ca + y) = A(ca) + Ay = c(Ax) + Ay = c(0™) +0" = 0". 


That is, ca + y € W, so W is a subspace by Theorem 2.26. 


Intersection of subspaces. For each 1 = 1, ..., m, consider the set 


W; = {[o?] in R”: Ada! +---+ Aba” = 0}. 
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If x = [x] and y = [y’] are in W; and if c is real, then 
Ai(ext ty!) +--+ Ab(ex™ +y") 
=c(Atal+...4 Ate”) + (Aly +---+ Aty”) =c-04+0=0. 


By Theorem 2.26, W; is a subspace of R” for i = 1, 2, ..., m. By 
Theorem 2.35 (i), the intersection, namely W, is a subspace of R”. 


Example 2.38. If X C R is an arbitrary non-empty subset, then 
ZA(xX)={f:R-R: f(x) =0 for all x in X}, 


the set of functions vanishing identically on X, is a subspace of F(R, R). 
This can be checked using Theorem 2.26. Alternatively, if xg is real, the 
set Z(xq) = {f : f(ao) = 0} is a subspace, so Z(X) is an intersection 
of subspaces: 

Z().= ('\ Z@o). 


roEX 


2.3 Spans, Sums of Subspaces 


In the preceding section, subspaces of a vector space (V,+,-) are con- 
structed by “cutting away”, namely by imposing conditions on elements 
of V. Dually, a subspace can be “built up” by taking sums and scalar 
multiples of elements. This is the viewpoint explored below. 


Definition 2.39. Let (V,+,-) be a vector space and S C V a set of 
vectors. A linear combination from S is a finite sum of the form 


m 
aly, + 2709 + +--+ 20m = y x' Vj, 
i=1 


in which the vectors v; in S are distinct, and the x’ are real numbers. 

The span of S is the set Span(S) of all linear combinations from S. 
We say S spans V if V = Span($). If some finite set S spans V, we 
say V is finite-dimensional. 


Lemma 2.40. Let (V,+,-) be a vector space, S and S’ subsets of V. 
(i) Span(S'}) ts a subspace of V. 
(ii) If S CS", then Span(S) C Span(S’). 
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(iii) Span(Span(S)) = Span(S). 


Proof. (i). If S is empty, then Span(.S) = {0}, which is a subspace. 
Otherwise, the fact that Span(S) is a subspace boils down to the fact 
that “a linear combination of linear combinations is a linear combi- 
nation”. Precisely, assume a and y are elements of Span(S), and let 
V1, ---, Um enumerate the vectors appearing in either linear combina- 
tion. By hypothesis, there exist scalars x’ and y’, 1 < i < m, such 


that 
m m 
c= So a'vi, y= So y'v. 
i=1 i=1 
If c is real, then 


m 


cer y=c ~ on =e So y'vi = Siler + y')v; € Span(S). 
i=1 i=1 


i=1 
By Theorem 2.26, Span(S) is a subspace. 


(ii). This is essentially obvious: Every element of S is an element 
of S$’, so every linear combination from S' is, a fortiori, a linear combi- 
nation from S’, i.e., Span(.S) C Span(S’). 


(iii). Clearly S € Span(S), so (ii) gives Span(S) C Span(Span(S)). 
Conversely, since Span(S) is a subspace, Span(S) is closed under addi- 
tion and scalar multiplication. That is, an arbitrary linear combination 
from Span(S) is an element of Span(S), which proves the reverse inclu- 
sion, Span(Span(s)) C Span(S). 


Proposition 2.41. Let (V,+,-) be a vector space, and S C V. Span($’) 
is the intersection of all subspaces of V that contain S. 


Proof. Let {W,,} denote the family of all subspaces of V that contain S, 
and let W =M,W, be the intersection. 

(W C Span(S$)). Since Span(S) is itself a subspace containing S, 
i.e., is one of the Wa, we have W C Span(5S). 


(Span(S) C W). Let Wa be an arbitrary subspace of V such that 
SC Wa. By Lemma 2.40, Span(S) C Span(W,) = Wa. Since W,, is 
an arbitrary subspace containing S, Span(S) CMW, = W. 
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Sums and Direct Sums of Subspaces 


Definition 2.42. Let (V,+,-) be a vector space. If W, and W, are 
subspaces, their sum is 


W1+W2 ={v in V:v = wi +we for some wy, in Wy and we in Wo}. 
Proposition 2.43. The sum W, + W2 1s a subspace of V. 


Proof. Let x and y be arbitrary elements of W; + Wa, and let c be real. 
By hypothesis, there exist elements 2; and y; in Wj, and x2 and yo 
in Wo, such that @ = 2, + & and y = y; + yo. Thus 


ca + y =c(a1 + ©2) + (yi + yo) = (cH1 + yi) + (caH2 + Yr). 


Since W, is a subspace, ca, + y, € W 1. Similarly, cag + yo € Wo. The 
preceding therefore shows ca + y € W, + Wo. 


Theorem 2.44. Let V be a vector space. If S, and Sz are subsets of V, 
and if W; = Span(S;), then Span($; U S2) = Wy + Wo. 


Proof. (Wy +W. Cc Span(S} U S9)). Since Sy; C Sy US, Lemma 2.40 
implies 

W, = Span($;) C Span(S; U $2). 
Similarly, W2 C Span(S; U $2). Since Span(S; U S2) is a subspace, we 
have W; + W2 C Span(Sj U $2). 


(Span(S; US.) CW, + Wo). If # € Span(S; U $2), then there exist 


vectors Uj, ..., Um in S; US» and scalars x’ such that « = yo, 0; 
Reindexing if necessary, we may assume there is a k, 0 < k < m, such 
that v1, ..., vz are in Sy and Ug44, ..., Um are in Sg. Thus 
k m 
2= (dow) + ( Ss vw) EW, + Wo. 
i=l i=k+1 


Definition 2.45. Let (V,+,-) be a vector space, and let W; and W2 
be subspaces. If W1 9 W2 = fo}, we say W, + Wo is a direct sum, 
and write W, @ Wo. 


Theorem 2.46. [fW = W, @W2 is a direct sum of vector spaces, and 
if w € W, there exist unique vectors wy in W, and wz in W2 such 
that w = w 1+ wo. 
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Proof. By the definition of a sum of vector spaces, there exist vectors 
w 1 in W, and we in W2 satisfying w = w, + we. To prove uniqueness, 
assume wi} in W, and w4 in W2 provide another “decomposition” of w. 
Since w); + w2 = w) + ws, we have w; — w) = wh — we. However, 
the left-hand side is in W, and the right-hand side is in W2. Since 
Win We = {0"}, we have w, — = OY and Wh — Ww, = OY" ie, 
wi = w} and we = w. 


Example 2.47. In (R?,+,-), the planes W; = fa : x! = 0} and 


W. = {x : x? = 0} are subspaces (Theorem 2.37) whose sum is all 


of R°. The general element x = (a!, x?,x°) may be written 


(0, x”, 2°) + (xt, 0,0) = (0, x”, 0) + (x',0, 2°) € W, + Wo. 


The sum is not direct, since the non-zero vector (0,0, 1) is in Win Wo. 


Example 2.48. In (R*,+,-), the planes W; = {(x',x?,0,0)} and 
W2 = {(0,0,2?,x*)} are subspaces whose sum is all of R*, and the 
sum is clearly direct. 


Proposition 2.49. Let n be a positive integer. If R"*" is the vec- 
tor space of square matrices under matrix addition and scalar multi- 
plication, Sym" is the subspace of symmetric matrices, and Skew" the 
subspace of skew-symmetric matrices, then 


R”*” = Sym” @ Skew”. 


In particular, every square matrix can be written uniquely as the sum 
of a symmetric matrix and a skew-symmetric matriz. 


Proof. (W1 + W2 = R”*”). Let A be an arbitrary n x n matrix. The 
matrices 


Asym — 5(A + A‘), Askew a 5(A = A‘) 
are symmetric and skew-symmetric, respectively: 
Agym = 3(A+A!)" = 5(A™+(AT)") = 5(AT +A) = Agym, 
Agnew = 3(A— AT)’ = 4(AT-(AT)") = 2(AT — A) = -Antew- 
Moreover, Agym + Askew 18 obviously A. This shows that an arbitrary 


nm X n matrix can be written as the sum of a symmetric and a skew- 
symmetric matrix. 


(Wi AW. = {0"*"}). If A is both symmetric and skew-symmetric, 
then A = A! = —A, so A= 0"*", 
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2.4 Bases and Dimension 


The “size” of a real vector space is quantified by the minimum number 
of spanning elements. For example, (R”,+,-) is spanned by the stan- 
dard basis (e;)”_,. It turns out that every set of fewer than n vectors 
does not span, while every set of more than n vectors is “redundant” , 
or “linearly dependent”, in that some proper subset also spans. (It is 
not true that every set of n vectors spans R”; for example, one of the 
vectors could be 0”, or all the vectors could be proportional.) 


Linear Dependence and Independence 


Definition 2.50. A set S of vectors in V is linearly dependent if there 
exists a non-trivial linear combination from S equal to the zero vector, 
i.e., if there exist distinct vectors v1, ..., vz in S (k > 1) and scalars 2", 


not all zero, such that 
k 


y x; = ov. 
i=l 
If S is not linearly dependent, we say S is linearly independent. 


Remark 2.51. The empty set is linearly independent. A non-empty 
set S' is linearly independent if and only if no non-trivial linear combi- 
nation from S$ is 0”. For later use we give a formal statement. 


Lemma 2.52. Let (V,+,-) be a vector space. A non-empty set S of 
vectors in V is linearly independent if and only if: For all (v;)E_, in'S 
and all scalars Gaaaee i > 0; = OY, then x = 0 for alli. 


Example 2.53. In (R’,+,-), the set S = {(1,0),(0,1)} = (e;)¥4 
is linearly independent: If (2!,x?) = rte; + 27e2 = (0,0), then (by 
equating components) gi =z? =0. 

The set S’ = {(1,1), (1, 1), (4,2)} = (v;)3y is linearly dependent. 
For example, 3v1 + v2 — v3 = (0,0). 


Theorem 2.54. Let (V,+,-) be a vector space, and let S be a set of 
vectors in V. The following are equivalent: 


(i) S is linearly dependent. 


(ii) There exists a vector v in S such that v € Span(S'\ {v}). That is, 
some element of S can be expressed as a linear combination of the 
other elements of S. 
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(iii) There is a proper subset S’ of S such that Span(.S") = Span(S). 


Proof. ((i) implies (ii)). If S is linearly dependent, there exist distinct 
vectors v1, ..., vg in S and scalars z!,..., 2*, not all zero, such that 


O=a!'u,+---+a* vp. 


Without loss of generality, we may assume x! 4 0. The preceding 
equation may be rearranged as 


1 
B= (ey +o they), 


which expresses some element of S as a linear combination of other 
elements. 

((ii) implies (iii)). Let v be as in (ii), and let S’ = S\ {uv} CS. 
By Lemma 2.40, Span(S’) C Span(S'). Conversely, v € Span(5S’), so 
every linear combination from S is a linear combination from 9", ice., 
Span(S) C Span($"). 

((iii) implies (i)). Let S’ be a proper subset of S with Span(’) = 
Span(S), and let v; be an element of S\S". Since v1 € Span(S’), there 


exist vectors V2, ..., Vz in S’ (necessarily distinct from v;) and scalars 
x, ..., ¢*, such that 
vi= xve feet choy Le., —wmc xv2 free a vp =o". 


This is a non-trivial linear combination from S whose sum is 0. By 
definition, S' is linearly dependent. 


Corollary 2.55. Let S be a linearly independent subset of some vector 
space (V,+,-). [fv EV\S, then SU{v} is linearly independent if and 
only if v € Span(S). 
Proof. If v € Span(S), then Span(.S U {v}) = Span(S), so SU {uv} 
is linearly dependent by the theorem. Contrapositively, if S U {v} is 
linearly independent, then v ¢ Span($). 

Suppose v ¢ Span(S), and assume 


autaluy te. 4 a*u, =0 


for some vectors v; in S and some scalars x, x’. If « 4 0, the preced- 
ing equation could be rearranged to express v as a linear combination 
from S, contrary to hypothesis; thus x = 0. Because S is linearly 
independent, x’ = 0 fori = 1,..., k. That is, the only linear combi- 
nation from S U {v} equal to 0 is the trivial linear combination. By 
Lemma 2.52, SU {v} is linearly independent. 
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Bases and Coordinate Vectors 


Definition 2.56. An ordered, linearly independent spanning set of V 
is called a basis of V. 


Remark 2.57. If (v1, v2) is a basis, then (v2, v1) is a different basis. 
(Round brackets signify an ordered set, while curly braces signify an 
unordered set.) Many books do not explicitly specify that a basis is 
ordered, but ordering is implicit when basis elements are indexed. 


Theorem 2.58. Let (V,+,-) be a vector space spanned by some finite 
set T. If S CV is a linearly independent set, there exists a basis S’ 
with SCS’ C SUT. In particular, some subset of T is a basis of V. 


Proof. The proof is inductive. Introduce the “initial” set Sg = S, and 
write T = (y1,..-,Yn). Form =1,..., n, define 


a Sm-1U {ym} if Ym ¢ Span(Sm—-1), 
SH if ym € Span(Sm—-1). 


By constriction; S,,-1 C Sy oC SUT and Span(yj)7) 4 C Span(Sin) 
for each m. In particular, S C S,, C SUT for each m. 
By Corollary 2.55 and induction on m, S;, is linearly independent. 
The set 5S” = S;,, is linearly independent, satisfies S C S’ C SUT, 
and spans V because V = Span(y;) 71 € Span(S’)-C V. 


Theorem 2.59. Let V be a vector space, and assume S = (vj), is 
a basis of V. For every vector « in V, there exists a unique ordered 


n-tuple of scalars (2)"_, such that 


n 
=> a 
4 XL V;. 
i=l 


Proof. (Existence). Let a be an arbitrary element of V. Existence of 
scalars x’ as in the theorem follows immediately from the definition of 
a Spanning set. 


(Uniqueness). Suppose there exist vectors U1, ..., Um in S and 
scalars x’ and y’ such that # = 5), a’v; and # = Do, y’v;. Subtracting 
the second from the first, 

m 


ov = wy ve) = (x 7) =S (ai = y')u;. 


i=1 


By Lemma 2.52, x’ — y’ = 0 for all i, ie., 2? = y’ for all i. 
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Definition 2.60. The ordered n-tuple [&]* = [2"] associated to the 
vector z with respect to the basis S is called the coordinate vector of x 
(in the basis S). 


Example 2.61. Let S = (e;)"_, be the standard basis of R”, 


1 0 0 
1 0 
el = ’ e2 >= ’ sgtity En = 
0 0 1 
If 2 = [2] € R", then @ = D>,a%e; = [ax]°; every vector is its own 


coordinate vector in the standard basis. 


Example 2.62. In R2*?, the matrices (€})} 15 namely, 


a 


1 0 0 1 0 0 0 0 
dis, Digpes 1 _ D554 
t= | 4 = (f ale b= [\ HE =f if 
constitute the standard basis of R?*?. An arbitrary matrix A = [A‘] 
may be written uniquely as a linear combination 
Aj Aj iat 1.2 201 29 : i od 
A? Ae = Ajey + Age; + Aje5 + Age = S- Aj; 

ij=l 
Example 2.63. For each pair of indices (i,j) with 1 <i < m and 
1<j <n, let e} be the m xn matrix having e; as its jth column and all 
other columns equal to 0, i.e., having a 1 in the ith row and jth column 


and 0 elsewhere. The collection S = (e! ) of all such matrices forms the 
standard basis of R™*". If A = [Ai] e R™*", then 


m n 


A=S-) > Ae). 


i=1 j=l 


That is, every m x n real matrix is its own coordinate vector with 
respect to the standard basis of R”*’™. 


Dimension of a Vector Space 


Theorem 2.64. Let (V,+,-) be a vector space spanned by a set S 
containing m elements. If S’ C V is a linearly independent set of 
n elements, thenn <m. 
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Proof. Write S = (v;)?2, and S’ = (v})?_,. Because S spans V, there 


exist scalars A‘, 1<i<mand1<j7 <n, such that 


m 
vi =) Ajy; forall p= Lister: 
i=1 


These scalars constitute a matrix A = [A‘] in R™*”, which defines a 
homogeneous linear system Ax = 0” of m equations in n variables. 
If x = |x/] is an arbitrary vector of coefficients, then 


n n m m n 
Salvi =Yoo'(S Ain) => Ajo!) 
j=l i=1 \j=1 


j=l i=l 


In particular, if a is a solution of Ax = 0”, then the linear combination 
on the left is OY. 

If m <n, ie., the system Aaw = 0” has more variables than equa- 
tions, then there exists a non-trivial solution x by Remark 1.37. This 
means OV is a non-trivial linear combination from S$’, i.e., 9’ is linearly 
dependent. 

Contrapositively, if S’ is linearly independent, then n < m. 


Corollary 2.65. Let (V,+,-) be a vector space admitting a basis con- 
sisting of m elements. If S’ = (v1,...,Un) is a basis of V, thenn =m. 


Proof. Let S be a basis of V containing m elements. Since S spans V 
and S’ is linearly independent, the theorem gives n < m. Since S’ spans 
V and S is linearly independent, m < n. 


Definition 2.66. If some (hence every) basis of V contains exactly 
n elements, we say V is n-dimensional, and write n = dimV. If V has 
no finite basis, we say V is infinite-dimensional, and write dim V = oo. 


Corollary 2.67. If (V,+,-) is an n-dimensional vector space, and if 
W is a subspace, then dimW <n, with equality if and only ifW =V. 


Proof. We first show W has a finite basis. 

Let So = @. If W = {0"}, then Sp is a basis. Otherwise, define 
sets S; (for k = 1, 2, ...) inductively as follows: If W 4 Span(S;_1), 
pick a vector vz; in W \ Span(Sz_1) and put S_ = Sz_1U {uz}. By 
construction, the set 5; is linearly independent and contains exactly 
k; elements. 
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It must be that W = Span(S,,) for some integer m with 0 < m < n; 
if not, then S41 C V is a linearly independent set of (n+ 1) elements, 
contrary to Theorem 2.64. This proves simultaneously that dim W is 
finite and no larger than n. 

If W = V, then obviously dim W = dimV. 

If W # V, there exists a vector up in V \ W; if (v,)it, is a basis 
of W, then (v5) 7X0 is a linearly independent set in V by Corollary 2.55, 
so dimW =m <m-+1< dimV. Contrapositively, if dimW = dim V, 
then W =V. 


Corollary 2.68. If (V,+,-) is an n-dimensional vector space, and if 
S is a set of n elements, then S is linearly independent if and only if 
Span(S) = V. 


Proof. Apply the preceding corollary to W = Span(S). 


Example 2.69. The standard basis of (R”,+,-) contains n elements, 
so dim R” = n. 


Example 2.70. The standard dual basis of ((R")*, +,-) contains n el- 
ements, so dim(R”)* = n. 


Example 2.71. The standard basis of (R”’*”",+,-) contains mn ele- 
ments, so dim R”*" = mn. 


Example 2.72. Let n be a non-negative integer. The polynomial space 


Pp = {a9 + ayt + aot? + +++ + ant} 


has the basis (1,t,#?,...,¢”), containing (n+1) elements. (Superscripts 
denote exponents here.) Thus dim P, = n+ 1. 


Example 2.73. The vector space of symmetric 2 x 2 real matrices is 
3-dimensional, and the space of skew-symmetric 2 x 2 real matrices is 
1-dimensional. To show this, compare an arbitrary 2 x 2 matrix A to 


its transpose A!: 
hee “B Tis) (Gs <e 
i= : : ls : | : 


These are equal if and only if b= c. These are negatives if and only if 
b= -—c anda =d=0. General symmetric and skew-symmetric 2 x 2 
matrices can therefore be written 


fb dmelo llr ol *@fo a} fp =¢h 


= ae; + b(e7 + e4) + ded, = b(e5 — e7). 
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The right-hand sides are linear combinations of specific matrices, and 
clearly the only way the right-hand side can be the zero matrix is if every 
coefficient is 0. It follows that {e}, e? + e}, e2} is a basis for Sym’, and 
{e} — e7} is a basis of Skew’. 


Example 2.74. The space P of all polynomial functions has infinite 
basis {1,t, t7,¢?,...}, so dim P = oo. 


Example 2.75. In the vector space (R”,+,-) of real sequences, define 
i= (15 0) Opus) ee (0, 190), ows = (Cec Op dy Orase)y, bss 
with e; having a 1 in the jth component and Os elsewhere. The set 
= (e5) joy is linearly independent, but does not span R”: Every linear 
combination from S is a finite sum, so every element of Span(S) has 
only finitely many non-zero components, while “most” elements of R” 
have infinitely many non-zero components. 

If we let S” = (e;)"_, be the first n elements of S, we may iden- 
tify R” with Span(S”), and with this identification we have inclusions 
R! CR?C...CR"C.... The union of these spaces, denoted R™, 
is Span(S'). For every positive integer n, we have proper inclusions 


RR” cR™' CR” CR*. 


Theorem 2.76. The solution space of a homogeneous linear system 
Ax = 0” of m equations in n variables is a vector subspace of R” 
whose dimension is equal to the number of free variables. 


Proof. The solution set of Ax = 0” is a subspace of R” by Theo- 
rem 2.37. When the coefficient matrix is put into reduced row-echelon 
form, the resulting system A’a = 0” has precisely the same solution 
space as the original system. 

Let @ denote the number of free variables, and say the free variables 
are x*l, ..., we; for simplicity, write x*i = y7. The basic variables are 
linear combinations of the free variables. 

For each j = 1,..., @, let v; be the solution obtained by setting yi = 
1 and all other free variables equal to 0. In particular, the k; component 
of v; is 1, and every other “free” solution has a 0 in the k; component. 

The solution space is clearly spanned by Carer and this set is 
linearly independent: If a/ are scalars satisfying 0” = alv,; +---+a‘vy, 
then a) = 0, as the kj;th component of the linear combination. This 
shows the solution space has a basis of ¢ elements (¢ the number of free 
variables). 
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Definition 2.77. If X is a vector space and W C X is a subspace, a 
subspace X’ is said to be complementary to W in X if X =W @ X". 


Lemma 2.78. Jf X is a finite-dimensional vector space and W is a 
subspace, there exists a subspace X' complementary to W in X, and we 
have dim X = dimW + dim X’. 


Proof. Write k = dimW, k + = dim X. Pick a basis (wy ,..., wz) 
of W. By Theorem 2.58, there exist vectors (a1,...,a%¢) such that 
S = (wy,...,Wr, L1,---, 2) is a basis of X. 

Consider the subspace X’ = Span(a1,...,a¢). Since S spans X, 
X =W+X’. Since S is linearly independent, W MX’ = {0}. Finally, 
é€=dim X', so dimX =k+¢=dimW +dimX’. 


Theorem 2.79 (The Dimension Theorem). Jf X and Y are finite- 
dimensional subspaces of some vector space (V,+,-), then 


dim(X +Y) =dimX +dimY —dim(X NY). 


Proof. Let W = X 1Y, and pick a basis (w1,..., wy) of W. Now use 
Lemma 2.78 to pick a subspace X’ C X with basis (a ,...,a) such 
that X = W @ X’, and a subspace Y’ C Y, with basis (y1,..., Ym) 
such that Y=W@Y’. 

We claim that S = (wi,...,W,,21,---,%e, Y1,---;Ym) is a basis 
of X+Y. Clearly X+Y = Span(S). To prove S is linearly independent, 
suppose there exist scalars (a’)*_,, (b’){_,, and (c’)™, such that 


k L m 
Ov =SoaiwjtS ba+S cy =wte+yewstxX'+y’, 
1=1 i=1 1=1 


Rearranging, —y = w+ 2; the left side is an element of Y’ C Y while 
the right side is in X. The common value is therefore an element of 
W=xXNY. ButWoY' = {0}, so y = 0 and w+2 = 0. Since 
Wn X’' = {0}, we have w = 0 and «= 0. 

Each of the sets (w;)*_,, (ai){_,, and (y;)%, is linearly indepen- 
dent, and the summands w, x, and y are individually 0, so the scalars 
(a’)F_,, (b*){_,, and (c*)', are all zero. This shows that no non-trivial 
linear combination from S is 0. That is, S is linearly independent, 
hence is a basis for X + Y. 

The dimension of X + Y is found by counting basis elements: 


dim(X+Y)=k+l+m=(k+4+(k+m)—-k 
= dim X + dimY — dim(X NY). 
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Example 2.80. A plane through the origin in (R”,+,-) is a two- 
dimensional subspace. 

If W, and W2 are distinct planes through the origin in R?, then 
2 < dim(W, + W2) < dim(R°) = 3, so in fact W; + W2 = R® by 
Corollary 2.67. Theorem 2.79 implies dim(W; 9M W2) = 1. That is, 
distinct planes through the origin in R® span R®, and intersect in a 
line through the origin. 

A pair of distinct planes through the origin in R* can intersect in 
a line (if the planes span a three-dimensional subspace) or a point (if 
the planes span R*). 


Example 2.81. If W, and W2 are distinct three-dimensional subspaces 
of (R°,+,-), their intersection can be a plane (if dim(W + W2) = 4); 
a line (if dim(W , + W2) = 5); or a point (if W; + W2 = R®). 


Definition 2.82. Ann x n matrix [A‘] is: 


(i) Upper triangular if every entry below the main diagonal is zero, 
i.e., if a = 0 whenever 2 > 7. 


(ii) Lower triangular if every entry above the main diagonal is zero, 
i.e., if Ay = 0 whenever 2 < 7. 


(i) Diagonal if every entry off the main diagonal is zero, i.e., if Ai = 0 
whenever 7 # 7. 


Remark 2.83. The sets of upper triangular and lower triangular n x n 
matrices span R”*”, and their intersection is the space of diagonal 
matrices. 


Example 2.84. If e? denotes the standard basis matrix with a 1 in the 
(i, 7) entry and 0’s elsewhere, then e? is upper triangular if and only if 
i <j; lower triangular if and only if 7 <7; and diagonal if and only if 
Le, 

When n = 2, the space of diagonal matrices has basis (et, e3). In 
addition, et is upper triangular, and es is lower triangular. If we let 
D, U, and L denote the spaces of 2 x 2 diagonal, upper triangular, and 
lower triangular matrices, respectively, then D = UNL, R?*? = U+L, 
and 

dim R?*%? =4=3+4+3-—2=dimU +dimL — dimD, 


in agreement with Theorem 2.79. 
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Computational Algorithms 


Let (V,+,-) be a vector space, and let S = (v;)?_; be an ordered 
subset. To determine whether S is linearly dependent or independent, 
express the zero vector 0” as a linear combination with coefficients 
x!,..., a”, reduce the resulting coefficient matrix to echelon form, and 
determine whether or not there are free variables (dependent) or not 
(independent). 


Example 2.85. Is the set S = {(1,1,—1,—1), (2, 2, 2, 2), (0,0, 1, 1)} 
linearly dependent? Here we might notice that 2v, — v2 + 4v3 = 04 by 
inspection, so S is linearly dependent. We can also proceed systemati- 
cally. Suppose x!, x?, and x° are scalars such that Det rv; = 0: 


0 it 2 0 av) +20? 
OW neat I ol 2 |2 3/0] _ x} + Qn? 
OW at ae 2 ae 1] |—a2! 4 2x? + x3 
0 —1 2 1 —7! +27? 4 73 


This is a homogeneous system, whose coefficient matrix is the 4 x 3 
matrix whose columns are the elements of S. (That is, we need not 
have written out the system at all!) We have 


Ro—-R 
D249 Ra—Rs, 12 0) eee. LT ee 
1 2 0] mer, |-1 2 1] 4h |0 1 4 
— 
-1 21 00 0 00 0 
-1 21 00 0 00 0 


From the row-echelon form, we see that x! and x? are basic and x? is 
free. This system has a non-trivial solution (i.e., 04 is a non-trivial 
linear combination from S$), so S' is linearly dependent. 


Example 2.86. For the S of the preceding example, is b = (1, 2,3,3) 
in Span(S)? Here we wish to express 6 as a linear combination from S, 
ie., to find scalars 2/7 such that x!v, + x?v2 + x?v3 = b, or prove 
none exist. Letting A denote the coefficient matrix in the preceding 
example, we wish to solve Aw = b. We form the augmented matrix 
and row reduce: 


Ro-R, 
by PO ae OA 
1 2 0/2} mors |-1 2 1/3 
-1 2 1]3 0001 
-1 2 1]3 00 0/0 
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Already we can stop; the third equation is inconsistent; the system 
Az = b has no solutions, so b ¢ Span(S). 


Example 2.87. In the (four-dimensional) polynomial space P3, con- 
sider the set S’ = {1+¢—#? — 13,2 + 2¢ + 2t7 + 203,47? + t9}, and let 
q(t) = 1+ 2t + 3t? + 32°. (Superscripts on ¢t denote exponents.) Is 
S" linearly independent? Is g an element of Span($’)? 

We use a’ to denote unknowns. (Superscripts on a connote indices.) 
We have two computational alternatives. 

The first is to set atp, + a?p2 + a°p3 = 0 as polynomials (i.e., to 
expand the left-hand side and equate all coefficients to 0), then solve 
the resulting system for the a’, looking for a non-trivial solution; or to 
set a!p, + a?p2 + a®p3 = q as polynomials and attempt to solve. 

The second method is to pick any convenient basis of P3, express 
each element of S’ (and the polynomial q) as a coordinate vector (Def- 
inition 2.60), and to perform computations in R*. But {1,t,¢?,¢°} 
is a basis of P3, and the coordinate vectors of the elements of S’ are 
precisely the elements of S in the preceding examples, while the coordi- 
nate vector of q is b = (1,2,3,3). From our work in the two preceding 
examples, we learn that 9" is linearly dependent, and q ¢ Span($’). 


Example 2.88. Let S = {(0,1,1), (1,0, 1), (1,1,0)} = (wj)#_,. Deter- 
mine whether Sis a basis of R®; if so, and if b = (b!, b, b°) is arbitrary, 
find the coordinate vector (b)°. 

Let x!, x”, and x? be coefficients. In theory our task is twofold: 
Show that if )); xv; = 0°, then x = 0 for all j (so that S is linearly 
independent, hence a basis of R°); and solve the system )~ . real vj = b. 
Both operations entail forming the coefficient matrix A whose columns 
are the elements of S and row reducing. Rather than row reduce the 
same matrix twice, we set out (“optimistically”) to answer the second 
question, and along the way answer the first. 

The augmented matrix is reduced as follows: 


R3—Re, 
0 11/0!) %-2,/1 0 1 b? 
tHe De OG a 2 bl 
11 0/8 O20s 28) bh br okibe 
—35Rs, 
Fars; 11°00 3 (0° b? — 5!) 
— >" 10 1 0) 5(6? +b! = 
001 Tip! b? — 6°) 
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Because there are no free variables, the original set is linearly indepen- 
dent, hence is a basis. As a fringe benefit, we learn that 


b+ 93 — b! al 
(be)? = 4 [oe +b! —b] = | 2? 
bl LL b2 _ b3 ae 


That is v!v, + 22v9 + 2°03 = b. 


Example 2.89. Let W C R° be spanned by a = (1,1,1,1,1) and 
x2 = (1,2,3,4,5). Find a homogeneous linear system Ax = 0° whose 
solution space is W. 

The idea is to treat the coefficients of a scalar equation as unknowns; 
that is, suppose A = [a! a? a> a* a?) is a set of coefficients for a 
linear equation vanishing on W. The vectors x; and a2 satisfy Ax = 0 


if and only if 


a! aa at a = 0, 


a! + 207430? + 4a*-+5a° = 0 


To solve, form the coefficient matrix and put it in reduced row-echelon 
form. The coefficient matrix has rows equal to the vectors in a basis 


of W: 

1 11 £1£éd1) R-R, }1 1 «1 1 1} RR Jl O -1l -2 -3 
123 4 5 01234 0 1 2 3 Ail 
The basic variables are a! and a?. Solving for the basic variables, we 
get at = a? + 2a4+4+3a°, a? = —2a® — 3a4—4a°. The general solution is 
a! a® +207 + 3a° 1 2 3 
a? —2a° — 3a* — 4a° —2 —3 —4 
a?| = a? = q° ia || Oa | 
a! ae 0 1 0 
a a 0 0 1 


The three columns on the right span the solution space; the correspond- 
ing linear equations vanish on W, and all together they “cut out” W. 
That is, W is the solution set of 


1 2 3 

a ia ao 12100 
Qa! — 3a? + x4 =0, 2-301 0 
li pane af g Eat 3-4 00 1 
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Exercises 


Exercise 2.1. Let V = F(R,R) be the vector space of real-valued 
functions on R under ordinary addition and scalar multiplication. De- 
termine (with justification) which of the following sets are linearly in- 
dependent. 


(a) (1-t+,14+t+e,t4+07} (1-t4+2,14+t+0,1+?}. 


(b) {cos? t, sin? t}; {1, cos? t, sin? t}. 

(c) {cost, sint}; {1, cost, sint}. 

Exercise 2.2. Find a basis for the subspace W C Ps defined by 
W= {pin Ps: p(—)) =p) =p) = 0} 


using the following strategy: Write the general element of Ps in the 
form p(t) = ag + at +--+ + ast, use each of the given conditions to 
impose a linear equation on the coefficients, and use row reduction to 
solve the resulting system of equations. 

Show that g(t) = t — t? divides each basis element. 


Exercise 2.3. Let a < 6b < c be real numbers, and consider the 
quadratic Lagrange interpolation polynomials 


ex(t) = (t— b)(t—¢)/|[(a—b)(a—o)], 
eg(t) = (t — a)(t — ¢)/[(b — a)(b — c)], 
ea(t) = (t — a)(t — a)/[(e — a)(c — )]. 


(a) Evaluate each polynomial at t = a, b, and c. 
(b) Show (e;)?_, is a basis of Py. Hint: Use part (a). 


(c) If pis an arbitrary quadratic polynomial, find a formula (in terms 
of a, b, c, and p) expressing p as a linear combination from Ce 
and write each of po(t) = 1, p(t) = t, and po(t) = t? as a linear 
combination from (e;)3_,;. Hint: Use part (a). 


Exercise 2.4. Let V = F(R,R) be the vector space of real-valued 
functions on R under ordinary addition and scalar multiplication. 


(a) A function ¢ is even if ¢(—t) = ¢(t) for all t. Prove that the set € 
of even functions is a subspace of V. 
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(b) A function ¢ is odd if ¢(—t) = —@(t) for all t. Prove that the set O 
of odd functions is a subspace of V. 


(c) Show that V = € + O, and determine whether or not the sum is 
direct. Hint: If f is an arbitrary function, consider the functions 


feven(t) = a(F()+f(-1), — foaa(t) = a(F (4) — f(-4)). 


(d) Write f(t) =e! as the sum of an even and an odd function. 
Exercise 2.5. Let (V,+,-) be a vector space, x, y and z elements of V. 


(a) Show {a, y} is linearly independent if and only if {a + y,x — y} 
is linearly independent. 


(b) True or false (with proof): {x, y, z} is linearly independent if and 
only if {a+ y,x2+2z,y+ z} is linearly independent. 


(c) True or false (with proof): {a, y, z} is linearly independent if and 
only if {aw — y,x — z,y — z} is linearly independent. 


Exercise 2.6. In R®, give specific examples of three-dimensional sub- 
spaces W; and W2 whose intersection has dimension (a) two; (b) one; 
(c) zero. 


Exercise 2.7. If A = [A‘] is a 2 x 2 matrix, the trace of A is defined 
to be tr(A) = At + A3, the sum of the diagonal entries. 


(a) Show that the set W of 2 x 2 matrices of trace 0 is a subspace 
of R2*?, 


(b) Following Example 2.73, find bases for W, for Sym? NW, and for 
Skew? NW. 


(c) Is it true that Sym?+W = R?*?? Explain. 


Exercise 2.8. Determine the dimensions of the spaces of upper tri- 
angular, lower triangular, and diagonal n x n matrices, and verify the 
Dimension Theorem for this example. 
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Euclidean Geometry 


Vectors allow us to speak of linear combinations, but not of geometric 
concepts such as length, angle, and volume. In order to do geometry 
(particularly in R”), we introduce additional structure. 


3.1 Inner Products 


Definition 3.1. Let (V,+,-) be a real vector space. An inner product 
on V is a function B: V x V > R satisfying the following conditions: 


(i) (Bilinearity) For all a1, v2, and y in V, and all real numbers c, 


B(cax + £2, y) = cB(x1, y) na B(x2, y), 
Bly, cv + £2) = cB(y, #1) + Bly, x2). 


(ii) (Symmetry) For all a and y in V, B(y,x) = B(ax, y). 


(iii) (Positive-definiteness) For all w in V, B(a,ax) > 0, with equality 
if and only if 2 = Ov. 


Remark 3.2. Bilinearity formalizes a two-sided “distributive law”, al- 
lowing inner products of linear combinations to be expanded in terms 
of inner products of vectors in the combination. 

To establish bilinearity for a particular function B, it suffices to 
verify symmetry and the first “bilinearity” condition. 


Example 3.3. (The Euclidean dot product) On (R”,+,-), define 
n . . 
(x,y) = aly = arty! Ah seta cy” = bene 
i=l 


53 
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The expression x! y is the 1 x 1 matrix obtained by multiplying the 
transpose of a (a row) with y (a column), viewed as a real number. 
Bilinearity amounts to the distributive law: If a, = [x], v2 = [x], 
and y = [y’] are arbitrary vectors and c is real, then 
n n 
(carr + 2,9) = S (cal + ab)y! = Seria! + ay’) 
i=1 i=1 


n n 
= cy riy' + So xby! =c(x1,y) + (2, y). 
i=1 i=l 


Symmetry and positive-definiteness are clear. 


Example 3.4. Let a < b be real. On V = C([a, bj, R), the vector space 
of continuous, real-valued functions on the interval [a, b] under function 
addition and scalar multiplication, the function 


b 
af fetta 


defines an inner product. Bilinearity amounts to the distributive law 
plus linearity of the integral; symmetry is clear; positive definiteness 
depends on the formal definition of continuity, and is omitted. 


B(f,g) = 


For concreteness, the theorems below are stated only for the dot 
product on R”. The proofs, however, use nothing more than the axioms 
for an inner product, and so they hold (with suitable interpretations) 
for an arbitrary inner product. 


Magnitude 


Definition 3.5. Let 2 = (',...,2”) be a vector in R”. The magni- 
tude of x is the non-negative real number 


el] = Vea) = (et)? + + (er = (Se?) ‘ 


i=l 


A vector of magnitude 1 is a unit vector. 


Proposition 3.6. [fx is a vector in R” and c is real, ||ca|| = |c| \|al]. 
Ifx #0”, then x may be written uniquely as a positive scalar multiple 
of a unit vector: 
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Proof. The definition of magnitude, together with bilinearity, gives 


Ileal] = V/ (cw, cw) = y/c? (w, x) = |e lla]. 


If « £ O”, ascalar multiple of x is a unit vector if and only if ||ca|| = 1, 
if and only if |c| = 1/||axl|, i.e., c = £1/||ax||. The only positive choice 
is c=1/|la|]. 


Definition 3.7. If « 4 0, the unit vector a/||x|| is obtained by nor- 
malizing x. 


Remark 3.8. A unit vector may be viewed as a “pure direction”. The 
proposition gives precise meaning to the “principle” that a vector is a 
quantity having both magnitude and direction. 


Theorem 3.9 (Cauchy-Schwarz Inequality). If « and y are vectors 
in R”, then 


| (x,y) | < lll lly. 


with equality if and only if x and y are proportional. 


Proof. If « = 0, then (a, y) = 0 and the conclusion of the theorem 
is immediate. Otherwise, for each real number t, consider the vector 
tx +y. Expanding ||ta + y||? = (ta + y, ta + y) as a function of t, 


O<(taty,ta+y) =t? (a,x) + 2t (x,y) + (yy) 
= t?|\arl|? + 2t (ax, y) + |lyll?. 


In other words, 0 < At? + Bt+C, with A = ||zx||?, B = 2 (x,y), and 
C = Ilyll2. 

A non-negative quadratic function satisfies B? — 44C < 0, or 
B? < 4AC; otherwise the quadratic formula would give two distinct 
real roots, and the quadratic would be negative in between. Plugging 
in the definitions of A, B, and C' gives 


({a,y))” < lel Ilyl?, or | (wy) | < lel lly. 


Equality holds if and only if the quadratic has a double root, if and 
only if there exists a real number t such that ta + y = 0, if and only if 
x and y are proportional. 
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Angle 
Theorem 3.10. /fx and y are vectors in R", then the angle 0 between 


x and y satisfies 
(x,y) = lla Il yl| cos 6. 


Proof. If x = 0 or y = 0, the equality in the theorem is immediate. It 
remains to handle the case with x and y both non-zero. Let 6 be the 
angle between a and y in the plane containing these two vectors. The 


| 
Ila — yl| 


0 ||| 


triangle with vertices 0, 2, and y has sides of magnitude ||z||, ||y||, and 
|a — y||. By the Law of Cosines, 


lla — yl]? = llaell? + ull? — 2llael] [ly] cos. 
But the left-hand side can be expanded as 


(a — y,@—y) = |al|? + |lyll? —2(@,y). 


Equating and canceling gives the theorem. 


Corollary 3.11. If x and y are non-zero vectors, there is a unique 
real number @ in the interval [0,7] such that 


(x,y) 


cos @ = ——* 
I|x|| ly 


Remark 3.12. The expression on the right lies in the interval [—1, 1] by 
the Cauchy-Schwarz inequality. 


Definition 3.13. Two vectors xz, y in R” are orthogonal if (x, y) = 0. 


Example 3.14. The vectors x = (1,1,1) and y = (1,—2,1) in R3 are 
orthogonal. 
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Example 3.15. If x = (1,0) and y = (1,1), then 


2? =(@2)=1, |yl?=@yw=2, (ey HL 


Consequently, cos@ = 1/\/2, so 0 = 1/4. This agrees with geometric 
expectation, since a is a side and y a diagonal of a square. 

We can perform angle computations for vectors we cannot easily 
visualize. If # = (1,0,0,0) and y = (1,1,1,1), then 


lei? =1, wi? =4, 9 (ey al 


Consequently, cos@ = 1/2, so 0 = 7/3. (Generally, the angle between 
a standard basis vector and the “diagonal of an n-cube” is not a “nice” 
multiple of 7.) 


Theorem 3.16 (The Pythagorean Theorem in R”). If x and y are 
vectors in R”, then 


ja + y||? = llall? + \lyl|? if and only if « and y are orthogonal. 
Proof. Expanding the left-hand side as a dot product, 


lla + yll? = (w+ y, 2% +y) = llal? + |lyll? +2 (x,y). 


The theorem follows immediately. 


3.2. Orthogonal Projection 


Definition 3.17. Let a be a non-zero vector in R”. If a € R”, a 
pair of vectors %g, £1 is called a decomposition of x into parallel 
and orthogonal components with respect to a if (i) f = @a+ 4&1; 
(ii) _ = ca for some real number c; (iii) (w,1,a) = 0. 


Theorem 3.18. Let a be a non-zero vector in R”. For each x in R", 
there is a unique decomposition of « into parallel and orthogonal com- 
ponents with respect to a, given by the formulas 

(a, a) 


projg (x) > (a a) ‘a, perpa (2) =e — projg(x). 


Proof. Let « be an arbitrary vector in R”. If c is a real number, 
the difference x — ca is orthogonal to a if and only if c satisfies the 
equation 0 = (a — ca,a) = (x,a) — c(a,a). By elementary algebra, 
c= (x, a) /(a,a). 


AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31 


58 LINEAR ALGEBRA 


Definition 3.19. If ais a 
non-zero vector in R”, then 
projg(a) is called the (or- 
thogonal) projection of «x 
on a. 


Remark 3.20. If ||w|| = 1, then proj,,(a) = (av, wu) - u. 


perpa(#) 


Example 3.21. Let a = (2,1), and # = (z!,2?) an arbitrary vector 
in R?. We have (a,a) = 5, (x, a) = 2x! 4+ 2, and 


9 1 2 
projq (x) = =" 2,1) = Ld! 4 202, 20) + 2), 
perpa(@) = & — projg(@) = $(x' — 2x7, —2x! + 42”). 


As expected, (perp,(x), a) = 5 (2(a1 — 2x7) + (-2c! + 42”)] = 0. 
Example 3.22. If 6 is real, a = (cos0,sin@), and a+ = (—sin8, cos 9), 
then 
projq(a) = (x! cos@ + x? sin 6) (cos 6, sin 8), 

proj. (a) = (2? cosé — a! sin #)(—sin 9, cos 0) = a — proj, (a). 
Example 3. ae: If a = (1,1,1,1), then (a,a) = 4. For an arbitrary 
vector 2 = (x1, 2?, GF) in Rowe ine eae nea ae ae! 
gi+e%+2° +24 

4 

petpa(x) = & — projg(x). 


proj; (a) = (abeaies lea 


Orthonormal Bases and Orthogonal Matrices 
Definition 3.24. A set (u ay _, in R” is orthonormal if each element 
is a unit vector and any two elements are orthogonal, i.e., (u;, uj) = 5. 
Theorem 3.25. [f S = (u uj); , is an orthonormal set of vectors in R”, 
and if x = Daj tJu,; for some scalars (09 )F_ ,, then x) = (x,u;). In 
particular, S is linearly independent. 


Proof. If x = Dig z/u;, then for each i, 


k 
(x, Us) =(5 ,eiuj, Ui) = » a ( (uj, Ui) = y wo So 
j=l 
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If = 0” is a linear combination from S, then x7 = (x,u,;) = 0 for 
all 7, so S is linearly independent. 


Corollary 3.26. [f S = (uj)f_, is an orthonormal set in R”, then 
S is a basis, and for every « in R”, 


n 
w = (@, uy) uy +--+ + (at, Un) «tin = So (a, ty) «ty. 
j=l 


Proof. Since S' is a linearly independent set of n elements in R” by 
Theorem 3.25, Sis a basis by Corollary 2.68. It follows that an arbitrary 
vector 2 may be written as a linear combination from S. Theorem 3.25 
gives the form of the coefficients. 


Toy. 


Corollary 3.27. [f(uj)7_) ts orthonormal in R", then >); uju, 


Proof. For every x in R”, uj a = (x, u;). By Corollary 3.26, 


n 


n n 
(> uju} = S/ uj (uj a) = > (@, uj): uj = 2 = Ipx 
j=l i= 


j=1 


for all x, so ar, uju; = [,, by Corollary 1.23. 


Definition 3.28. An n x n real matrix P is orthogonal if P~! = P', 
ie., if P'P =I, and PP! = In. 


Example 3.29. Let 6 be real. The following matrices are orthogonal: 


Rete: Be 6 —sind cos 9 sin | 


sin@ cos | , Reta i 6 —cosé 

(The notation is explained in Examples 4.17 and 4.19.) 

Example 3.30. A permutation matrix (Exercise 1.13) is orthogonal. 

Theorem 3.31. The following are equivalent for annxn real matrix P: 
(i) The columns of P form an orthonormal set in R”. 

(ii) The columns of P' form an orthonormal set in R”. 


(iii) P is orthogonal. 
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Proof. In general, if A and B aren xn matrices, the (7, 7) entry of ANB 
is the dot product of the 7th column of A and the jth column of B. 
Similarly, the (i,j) entry of AB! is the dot product of the ith column 
of A! and the jth column of B'. 

Since In = [5%], the equation P'P = In, holds if and only if the 
columns of P are orthonormal, and PP! = J, holds if and only if the 
columns of P' are orthonormal. The content of the proof is to show 
each of these equations implies the other. 

((i) if and only if (ii)). If the columns of P are orthonormal, then 
P'P = 1,. Further, the columns of P are a basis of R” by Theo- 
rem 3.25. Consequently, the reduced row-echelon form of P' is Ip, so 
P! is invertible by Theorem 1.49. By Theorem 1.30 (ii), PP' = I, as 
well, so the columns of P! are orthonormal. 

The converse implication follows mutatis mutandis by exchanging 
the roles of P and P'. 

Since (iii) is equivalent to “P'P = I, and PP' = I,,”, the proof is 
complete. 


Theorem 3.32. The set O(n) of n x n real orthogonal matrices is a 
group under matrix multiplication. 
Proof. The identity matrix I, is orthogonal, and acts as the multiplica- 
tive identity element in O(n). 

If P and Q are orthogonal, then 


(PQ) =Q'PT=Q'P! =(PQ)", 
so PQ is orthogonal. That is, O(n) is closed under multiplication. 


By Theorem 3.31, P is orthogonal if and only if P' = P! is 
orthogonal, so O(n) is closed under inversion. 


Theorem 3.33. The following are equivalent for annxn real matrix P: 
(i) (Pa, Py) = (a, y) for alla, y in R”. 
(ii) P is orthogonal. 
Proof. Since (x, y) = «'y, we have 
(Pa, Py) = (Px)"(Py) = (#'P')(Py) =a'(P'P)y, 
and therefore 
(Pw, Py) — (w,y) = @'(P'P)y— a" (In)y =@"(P'P —In)y 


for all x, y. It follows that (Pa, Py) = (a, y) for all zw, y in R” if and 
only if P'P — J, = 0"*”, if and only if P is orthogonal. 
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The Gram-Schmidt Algorithm 


Theorem 3.34. If W is a subspace of (R",+,-), then W has an or- 
thonormal basis. 


Proof. The idea is to start from an arbitrary basis (vj)! of W and “or- 
thonormalize” the vectors inductively, constructing, for k = 1, ..., m, 
an orthonormal basis (u;)hy of the space W;, = Span(vj)#_4, see Fig- 
ure 3.1. The process is called the Gram-Schmidt algorithm. When 
k =m, we obtain an orthonormal basis of W. 

Since vj # 0”, we may set uw; = v;/||v1||. Assume pducetvely iat 


for some k > 1, we have constructed an orthonormal basis (u,;);_, of 


the space W;, = Span(v;)f_,. Define 


k 
Wh 41 = VE+1 — Ss" (Ue41, Uj) * Uy. 
j=l 
Note that u,,, 4 0” since vz41 does not lie in Wy = Span(w;)f_,, 
and that (thy 1 Uy) = 0 for 7 = 1, ..., k by direct computation. We 
may therefore put up41 = Up, 1/||U, 411]. 


k+1 
j=l 


Since each element of the linearly independent set (w;);"; is a linear 


k+1 
j=l ) 


each set is a basis of W;,.41. 


combination from (v;) 


Vv 
V3 o 


V2 


Ui 


Figure 3.1: The Gram-Schmidt algorithm. 


The Orthogonal Complement of a Subspace 


Definition 3.35. If W is a subspace of (R”,+,-) equipped with the 
Euclidean inner product, the orthogonal complement W~ is 


W+ = {a2 in R”: (x, w) =0 for all w in W}. 
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Remark 3.36. The fact that W+ is a subspace comes from bilinearity 
of the dot product: If a, y are in W+ and cis real, then 


(ca + y,w) =c(x,w) + (y,w) =0 for all w in W. 


Theorem 3.37. Let W be a subspace of (R",+,-) and (uj)? an 
orthonormal basis of W. If a is an arbitrary vector in R”, define 


m 
projy (x = x ,U;) Uj. 
j=l 


(i) We have projy (a) € W and x — projy(a) € Wt. 


(ii) For every w in W, we have ||x — projy(x)|| < |la — wll, with 
equality if and only if w = projy (x). 


Proof. (i). Since S = (uj)i"", € W and projy(@) is a linear combina- 
tion from S', projy (a) € W. 

The proof of Theorem 3.25 shows that (projy(«),uj) = (x, uj) 
er ol ie m. It follows that (a@ — projy(#),uj;) = 0 for 7 = 
tL fecaaget Sivice each element of W is a linear combination from (w;)je 43 
(a — ee ),w) = 0 for all w in W, ie., x — projy(x) € Wt. 


(ii). Let w be an arbitrary vector in W. Since projy(a) — w is 
in W (as a difference of vectors in W) and therefore orthogonal to 
x — projy(a) by part (i), the Pythagorean Theorem gives 


lla — projy (&)||? < |e — projy (a) ||? + || projy(w) — wl]? 


= |\(@ = projyy(@)) + (projy(@) = w)|P 
= lle — wll?, 


with equality if and only if w = projy(x). 


Remark 3.38. Though the definition of projy(a) depended on a choice 
of orthonormal basis of W,, the value does not: Property (ii) shows there 
is at most one element of W that is closer to a than all other elements 
of W, and projy(x) satisfies this “best approximation property”. 


Corollary 3.39. If W C R” is a subspace, then R”° =W @W?. 


AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31 


CHAPTER 3. EUCLIDEAN GEOMETRY 63 


Proof. (WOW+ = {0"}). If 2 ¢eWnW?, then z is orthogonal to 
itself. That is, (a,x) = 0. By positive-definiteness, a = 0”. 


(R” = W+W+). By the Gram-Schmidt algorithm, W has an 
orthonormal basis (w;) 7". If # € R”, then by Theorem 3.37, 


aw = projy (x) + (# — projy(a)) «W+W~. 


Corollary 3.40. If W is a k-dimensional subspace of (R",+,-), there 
exists a homogeneous linear system Ax = 0"—* of (n—k) equations in 
n variables whose solution set is precisely W. 


Proof. A linear equation ayx! + --- + anz” = 0 may be interpreted as 
asserting that the product of the row matrix a! = [aj] with the column 
x = |x] is 0, i-e., that the vectors a = [a’| and are orthogonal. 

Let W C R” be a subspace. By Corollary 3.39, the orthogonal 
complement W+ has dimension (n — k). Pick a basis (aye, with 
a; = [a7], and form the matrix A = [a'] whose ith row is a}. 

If « € R”, then Ax = O"-* if and only if (a;,a) = 0 for each 
i=1,...,n—k, if and only if « € (W+)-+, if and only if x € W by 
Corollary 3.39. 


Example 3.41. Let vg = (1,1,1,1), and W = Span(v9)+ C R*. Find 
an orthonormal basis of W. 
A vector x = (x!, 27, x, x4) is in W if and only if 


0 = (2, v9) =a +a? +02 427%. 


The coefficient matrix [1 1 1 1] is already in reduced row-echelon 
form. The variables x”, 7°, and «* are free, and the general solution is 


—(a? + 2° + x*) =1 =1 = 
2 

x _ 2 1 3 0 4 0 

aa =2 0 +2 1 + 2 0 

ie 0 0 1 


Applying Gram-Schmidt to the columns (v;)3 4 on the right-hand side, 


= 

V1 1 1 

Ul SS es 
vil] v2] 0 
0 
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Next, since (v2, ui) = 1/V2, 


= —1 3 as) 
0 1 —5 —l 
Ug = V2 — (2,41) U1 =] 4 —5 tee ie = 4 5| 3 
0 0 0 0 
—l 
ae V5 1 |=1 
normalizing gives U2 = ~_—7 = 7 2 
Imll Ve} 2 
0 
Finally, 
V3 = V3 — (V3, U1) > U1 — (V3, U2) » U2 
—l —1 —1 —l 
= O_o —l = —1 ee —l 
i 0 2 0 6 2) 3 |-1]’ 
1 0 0 3 
—l 
: 1 —l 
SO U3 = aude a . 
vs 2/3 | —1 
3 


Remark 3.42. Normalizing a vector is equivalent to normalizing a non- 
zero scalar multiple; in the preceding example, we drop the fraction 
multipliers before computing magnitudes. 


3.3. The Determinant 


Definition 3.43. Let (vj)/_, be a set of vectors in R". The bor (or 
parallelipiped) they span is the set of linear combinations 


n 
So x! vj, 0<2) <1forj=1,...,n. 
j=l 
The vector vj is the jth edge of the box. 
The unit cube is the box spanned by the standard basis (e;) 


n 
j= 


We wish to define a notion of “oriented (n-dimensional) volume” 
for a box together with an ordering of its edges, generalizing oriented 
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area of a plane parallelogram, Figure 3.2. Our approach is to give 
azioms for a real-valued function on the set (R”)” of ordered n-tuples 
of vectors in R” in such a way that oriented volume can be computed 
by assembling (v;) into an n x n matrix and using row reduction. 


Tl ah 


V2 V2 


Figure 3.2: Positive and negative oriented area. 


Along the way, we find a polynomial formula for oriented volume 
in terms of the components of the (v;). The catch is, the formula 
contains n! summands. For matrices of size 2 x 2 (two terms) and 3 x 3 
(six terms) the formula is suitable for practical use. Even for 10 x 10 
matrices (10! = 3,628,800 terms), however, row reduction is the only 
serious approach for computing oriented volume. 


Theorem 3.44. There exists a unique function vol : (R”)" > R sat- 
isfying the following conditions for all (v;)/_) in R": 


(i) For all ay, x2 in R” and all real c, 


vol(ca1 +22, V2,...,Un) = cvol(a1, V2,...,Un)+vol(xe, v2,..., Un). 


(ii) Swapping two arguments changes the sign: 


VOI cnesOpney SOR ce.) Ol ioe OU creer DL so) 
In: particular, Vols nD 05 Oy ose) = 0% 
(iii) vol(e1, e2,...,e€n) =1. 


Remark 3.45. The “distributive law” in (i) extends to arbitrary lin- 
ear combinations in the first argument by induction on the number of 
summands. 

In conjunction with (ii), the property of skew-symmetry, there is 
a similar distributive law in the 7th argument: Swap the first and jth 
arguments using (ii), apply (i) to break up the linear combination, then 
re-swap the first and jth arguments in each summand; the overall effect 
is to break up a linear combination in the jth argument. 
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Before giving a proof, we look explicitly at the cases n = 2 and 3. 
Example 3.46. If vy = ate, + a7eg and vo = ble, + be, then 
vol(v1, ¥2) = a'b? — a*b!. Indeed, the “distributive law” gives 

vol(v1, v2) = vol(ate; + ae, b'e; + be) 
= q'p! vol(e;, e1) + ab? vol(e€1, €2) 


+a°b! vol(e2, €1) + ab? vol(€2, €2). 


Since vol(e;,e1) = vol(e2,e2) = 0, and vol(eg,e1) = — vol(ej, e2) 
by (ii), 
vol(v1, v2) = a'b? vol(e1, e2) + ab! vol(ee, e1) 


= (a'b? — a*b') vol(e1, e2) 


= a'b? — ab!. 


Conceptually, the “distributive law” allows vol(v1, v2) to be expressed 
in terms of vol(e;,e;) for all 2? = 4 ordered pairs of indices i and j. 
Terms with repeated index are 0, so only 2! = 2 terms are non-zero, 
one with positive sign, one with negative sign. 
Example 3.47. If v; = (a!,a?,a?), vo = (b', b?, b3), v3 = (cl, c?,c3), 
then 
vol(v1, v2, v3) = al (b2c3 — b8c?) + a? (b8ct — btc3) + a3 (ble? — bc). 

This can be handled by brute force, but now instead of 2? = 4 terms 
(only two of which are non-zero), there are 33 = 27 terms, of which 
only 3! = 6 are non-zero. Examining the structure of the computation 
saves enough work to be worthwhile. 

When we use v; = ale; +a%e2+a%e3 (etc.) to expand vol(v1, v2, v3), 
we get summands -_ 

abi c® vol(ei, ej, Ek); 

with each index 7, j, k taking on each value 1, 2, 3. Because of skew- 
symmetry, any term with a repeated index is zero. Thus, there is one 
non-zero term for each permutation of the indices 1, 2, 3, and 


vol(e;, ej, ex) = + vol(ej, €2,€3) = +1, 


the sign being determined by the number of “swaps” needed to put 
i, j, k in increasing order. The end result is 


vol(v1, v2, v3) = a'b?c? — abc? + a7 b8c! — a*b'c? + abc? — a° bc! 
= a (v3 =3 bc?) he a’ (bc = bic) af a(bic? = bct). 
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Proof. (Uniqueness). Because an arbitrary vector v; can be expressed 
(uniquely) as a linear combination of standard basis vectors, the “dis- 
tributive law” shows vol is completely determined by its values on ar- 
bitrary n-tuples of standard basis vectors. 

By (ii), if any vector is repeated as an argument, the corresponding 
summand is 0. That is, vol is determined by its values on n-tuples of 
distinct standard basis vectors, i.e., on permutations of the standard 
basis. But every ordering of the standard basis vectors can be put 
into “standard” order by swapping arguments. It follows that vol is 
completely determined by vol(e1,...,@n), which is equal to 1 by (iii). 

To implement the preceding conceptual calculation as a formula, 
write, for each 7 = 1,..., n, 


n 
v3; = Ajei +--+ Aven = So Aje;. 
i=l 
Each non-zero term in vol(v1,...,Un) comes from a permutation a. If 
(—1)” denotes the sign of a, ie., +1 if o is an even permutation and 
—1 if o is odd, then the term corresponding to a is 


ACMI AES?), AEN aol Ory ax25 OG) SA 1) AR AS ag AR 
Summing over all permutations in the symmetric group S;, gives 
vol(v1,...,0n) = D> (-D7AgM Ag?) ... ag. 
(Existence). It remains to show the preceding satisfies (i)—(iii). 
Condition (i) follows from the distributive law for real numbers, 
because each summand contains precisely one factor coming from vj. 
To prove Condition (ii), let 7 = (7 7) exchange 7 and j, and note 
that (—1)” = —(—1)°7 because 7 is odd. The general summand after 
swapping v; and v; is 
(Ae BO) os, ARTO oo Sayer ger) gore)... ger 


Summing over o in S, amounts to summing over oT. 
eee eee 4 . 
To prove (iii), note that e; = )7; 6je;, so 


vol(€1,-.-5€n) = So (-1)752 52)... 62. 
aESn 


The only summand with a non-zero contribution comes from the iden- 
tity permutation, and the contribution is 1. 
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Example 3.48. The box spanned by ((2, —3), (1, —4)) has signed area 


det eg a O(a) = 


That is, the box (parallelogram) has area 5, and the small angle from 
(2, —3) to (1, —4) is oppositely oriented to the small angle from e; to e2 
(because the sign is negative). 


Definition 3.49. The determinant of an n x n matrix A= [A‘] is 
dg AS yo Ara ln 

Theorem 3.50. If A is ann xn matriz, then det A = det Al. 

Proof. Since (-1)7" = (—1)’, summing over o~! in S, gives 


detA= So (-1)747 ag) 


a tESn 


=, aal 2 n 
= 5S >(-1) Agcy 4e(2) «Ae tn): 


The last sum is det A!, since the (i,j) entry of AT is Al. 


Theorem 3.51. Jf A = [A‘] is upper triangular or lower triangular 
then the determinant is the product of the diagonal entries: 
n 
det A = | | Ai. 
i=1 


Proof. Assume first that A is upper triangular, namely that Ai = (). for 
j <1. In the polynomial formula for det A, a summand 


(a7 ae ac? oe AC? 


is zero unless o(j) < j for each j7. But the only permutation satisfying 
these conditions is the identity: If o is not the identity, there is a 
smallest index j with o(j) 4 j, and since @ is a bijection of the set 
{1,2,...,n}, (7) > 7. Consequently, if A is upper triangular, then 


deb RTA AL 2 ch AAR Ae, 


If A is lower triangular, then A! is upper triangular. 
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Determinants and Row Operations 


The determinant of a square matrix is the signed volume of the box 
spanned by its columns or (because of Theorem 3.50) by its rows. (This 
is geometrically non-obvious; the boxes themselves do not generally have 
the same shape.) 

Properties (i) and (ii) for the determinant, regarded as a function 
of the rows of an m x m matrix, specify how the determinant changes 
under an elementary row operation: 


Theorem 3.52. Jf A is anim x m matriz, E an elementary matriz, 
and A = EA’, then det A = (det E)(det A’). 


Proof. We check the claim explicitly for row operations of each type. 


(Type I). Adding a multiple of one row to another does not change 
the determinant: 


VOl(ecegp CUi ee Days 3%) 
=VOlaig Wioacgit pce) EVO cg Wis 0.0 OY ache) 


= VON cg Diy Sng Dine): 


But a Type I. elementary matrix is triangular, with 1s on the diagonal, 
so det F = 1 in this case. 


(Type II). Multiplying the ith row by c multiplies the determinant 
by c. But the Type II. elementary matrix for this operation is diagonal, 
with single c in the (7,7) entry and 1s elsewhere on the diagonal, so 
det E = c in this case. 


(Type II). Exchanging two rows of A multiplies det A by —1. But 
the corresponding Type III. elementary matrix FE’ is obtained from the 
identity by swapping two rows, so det # = —detJ,, = —1 in this 
case. 


Remark 3.53. As a fringe benefit of the proof, the determinant of an 
elementary matrix is non-zero. 

Theorem 3.52 (together with induction on the number of factors) 
shows that if F is a product of elementary matrices, then (i) det E 4 0; 
(ii) det(EC) = (det £)(det C) for every m x m matrix C. 


Corollary 3.54. A matrix A is invertible if and only if det A # 0. 
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Proof. Put A in reduced row-echelon form, i.e., factor A = LA’ with 
E a product of elementary matrices and A’ a reduced row-echelon ma- 
trix, necessarily upper triangular. 

By Theorem 1.49, A is invertible if and only if A’ = I, if and 
only if A’ does not contain a row of Os, if and only if det_.A’ 4 0. The 
corollary follows since det E # 0 and det A = (det £)(det A’). 


Corollary 3.55. [f A and B are m x m matrices, then 
det(AB) = (det A) (det B). 
In particular, det(AB) = det(BA). 


Proof. Factor A = EA’ as in the preceding proof. If A is invertible, then 
A’ = Im, and det(AB) = det(EB) = det(E)(det B) = det(A)(det B). 

If A is not invertible, then det A = 0 by the preceding corollary. 
Further, the reduced row-echelon matrix A’ has a row of 0s, so A’B has 
a row of zeros. The reduced row-echelon matrix of A’B consequently 
has a row of zeros, so det(A’B) = 0, and 


det(AB) = (det E)(det A’B) = 0 = (det A)(det B). 


Corollary 3.56. If A is invertible, then det(A~!) = 1/ det A. 


Proof. We have Im, = AA~!, so 1 = det Im = (det A)(det A7!). 


Corollary 3.57. If A is an arbitrarymxm matrix and P is invertible, 
then det(P~!AP) = det A. 


Proof. We have det(P~!AP) = (det P~!)(det .A)(det P) = det A. 


Example 3.58. Find the signed volume of the box spanned by the 
ordered triple ((1, 1,1), (1,2, 4), (1,3, 9)). 

The task is to find the determinant of the matrix whose columns 
are the given vectors. Although there is a six-term formula for a 3 x 3 
determinant, we use row operations for simplicity (and illustration). 
We also use vertical bars, det A = |A], to save space: 


Nae iT ae (ee 
f) 2D: 3) es” Ok 2) SS 10 Te SG 2, 
149 038 002 


For practice, check the answer with the six-term formula. 
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Example 3.59. If a < b < c are real numbers, then 


1 a a?|-Rl1 a a? 
1b w| lo ba (b—a)(b +a) 
lL eve 0 c—a (c—a)(c+a) 
Ro, 
ara loa a? 
“+ (b—a)(c—a)|0 1 (b+a) 
0 1 (c+a) 
a Lee a? 
=? (b—a)(e—a)|0 1 (6+a)|=(6—a)(c—a)(c—0). 
0 0 (c—6b) 


Exercises 


Exercise 3.1. Let a be a non-zero vector in R”, and let A be a non-zero 
real number. 


(a) Use the orthogonal projection formula to show that 
projyq(x) = proja(x) for all a. 


(In words, orthogonal projection to a depends only on the direction 
of a, not on its magnitude.) 


(b) If a is a unit vector, how does the formula for proj,(a) simplify? 
Exercise 3.2. Each part refers to 


Ul = all, 1, 1), 


(a) Verify that {w1, w2, u3} is an orthonormal set. 


U2 = (1,0, —1), ug = 4(1,-2,1). 
(b) Show by direct calculation that if # = (a1, x?, x3) is arbitrary, then 
x = (©, U1): Uy + (x, U2) - U2 + (x, UZ) + UZ. 


Exercise 3.3. In R+, suppose uy = 5(1, LAA) iS 5(1, —1,-1,1), 
u3 = 5(1,-1,1,-1), wa = 5(1,1,-1,-1). 


(a) Verify that {w1, u2,u3,u4} is an orthonormal set. (You need 
to check six inner products, in addition to computing four mag- 
nitudes. ) 
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(b) Calculate the four outer products (wjw) )f—4- 


(c) Show by direct calculation that if # = (x!, x, 23, 2*) is arbitrary, 
x = (@, U1) - Uy + (x, Ug): U2 + (a@, UZ) - UZ + (a, U4) - Ud. 
Exercise 3.4. Each part refers to the vectors vy = (1,1,1,1) and 
v2 = (1,2,3,4) in R4, and the plane W they span. 
(i) Use Gram-Schmidt to find an orthonormal basis of W. 
(ii) If a = (x!, x”, x3, x*) is arbitrary, decompose x in W 6 W+. 
(iii) Find a system of two equations in four variables whose solution set 
is W. Similarly for W+. 
Exercise 3.5. Referring to Example 3.41: 


(a) Form the matrix P whose columns are (u;)3 9 and verify that 
Pip jyend PP 14 

(b) Calculate the projection matrix projy = uju} + ugud + ugud 

directly, and verify the result is consistent with Example 3.23. 


Exercise 3.6. Let x and y be vectors in R”. The parallelogram they 
span is the plane figure with vertices 0”, x, y, and «+y. The diagonals 
are the segments from 0” to x + y and from z to y. 


(a) Prove the parallelogram law: 
lla + yl? + [le — yll? = 2([lel? + Iyll?), 
and give a geometric interpretation. 
(b) Prove the polarization identity: 
lla + yll? — lle — yll? =4 (x,y). 


Conclude that the diagonals have equal magnitude if and only if the 
parallelogram is a rectangle. 


(c) Prove that the diagonals of a parallelogram are orthogonal if and 
only if the parallelogram is a rhombus. 


Exercise 3.7. Let P be an n x n matrix. Prove that P is orthogonal 
if and only if ||Pa|| = ||a|| for all 2 in R”. 

Suggestion: One direction is immediate from Theorem 3.33. For the 
other direction, use the polarization identity (preceding exercise). 
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Chapter 4 


Linear Transformations 


4.1 Linear Transformations 


Definition 4.1. Let V and W be arbitrary vector spaces. A mapping 
T : V — W is said to be a linear transformation if T satisfies the 
morphism condition: 


T(ca+y)=cT(x)+T(y) for all x, y in V and all real c. 


Remark 4.2. Geometrically, T(ca + y) = cT(a) + T(y) means T maps 
the parallelogram in V with sides ca and y to the parallelogram in W 
with sides cT(a) and T(y), Figure 4.1. The mapping T is a linear 
transformation if and only if T has this effect on every parallelogram. 


T(y) T(ca@ + y) 


Figure 4.1: A linear transformation preserves parallelograms. 


4 


Remark 4.3. Induction on the number of summands proves that “a 
linear transformation distributes over linear combinations”. Precisely, 
if v1, ..., Vg are elements of V and z!,..., x* are real numbers, then 


i (>: ve) = Seto. 


73 
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Definition 4.4. Let V, V’, and V” be vector spaces, T.: V — V’ and 
T’: V' + V" linear transformations. The composition T'T : V > V” 
is defined by T’T (x) = T’(T(a)) for x in V. 


Remark 4.5. A composition T’T is linear, Exercise 4.8. 


A linear transformation is the analog in linear algebra of a group 
homomorphism. Much of the terminology carries over, as do basic 
results about homomorphisms. 


Definition 4.6. Let T’': V — W be a linear transformation. 
The kernel of T is the set 


ker(T) = {a in V: T(a) =O} CV. 
The image of T is the set 
im(T) =7T(V) ={yinW : y =T(a) for some # in V} C W. 


If, for all a, and a in V, T(a1) = T(a2) implies x; = x2, then 
T is said to be injective. 

If T(V) = W, ie., if, for every y in W, there exists an # in V such 
that y = T(x), then T is said to be surjective. 

If T is bijective, i.e., both injective and surjective, then T is an 
isomorphism (of vector spaces). 


Theorem 4.7. Let T: V > W be a linear transformation. 
(i 


) The kernel of T is a subspace of V. 
(ii) 

) 

) 


The image of T is a subspace of W. 
(iii) T is injective if and only if ker(T) = {0}, i.e., dimker(T) = 0. 


(iv) IfT is an isomorphism, then the inverse map T~!:W — V is an 
isomorphism. 


Proof. (i). We first show T(0” ) = 0”, proving the kernel is non-empty: 
If a is an arbitrary element of V, then 


T(O”) =T(0- x) =0-T(ax) =O". 
If x; and a2 are elements of ker(T’) and c is real, then 


T (ca + #2) = cT (#1) +T(a2) =c-0"” +0” = 0", 
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so cv, + #2 € ker(T’). By Theorem 2.26, ker(T) is a subspace of V. 


(ii). Suppose y; and y2 are elements of T(V) and c is real. By 
hypothesis, there exist x; and a2 in V such that T(a,) = y; and 
T (x2) = yo. Consequently, 


T (ca, +22) = cT(a1) + T(x2) = cy + yo, 
so cy; + y2 € T(V). By Theorem 2.26, T(V) is a subspace of W. 


(iii). Suppose T is injective. If x is an arbitrary vector in ker(T), 
then T(a) = OW = T(0”), so a = OY. That is, ker(T) = {0”}. 

Conversely, suppose ker(T) = {O”}. If T(a1) = T(a2), we have 
ow = T (21) = T (x2) = T (x1 _ x2), Le€., ©] — Lo € ker(T) = fo}. 
That is, #7; = a. Since a, and 2&2 were arbitrary, T' is injective. 

(iv). The mapping T~! : W — V is defined by T~!(y) = a if and 
only if T(a) = y. Bijectivity of T~! is obvious; it suffices to prove 
T—! is linear. 

Suppose y; and yo are elements of W, c is real, and a, and a2 are 
the unique vectors in V such that T(a1) = y; and T(a2) = yo. Since 


cy + yo = cT (#1) + T(x) = T (ca, + 22) 


by linearity of T, the definition of T~! gives 


T ley, + yo) =ca, +42 = cT!(y) + T 1 (yp). 


Corollary 4.8. If V is finite-dimensional and T : V > W is a linear 
transformation, then dimT(V) < dimV, with equality if and only if 
T is injective. 


Proof. If (v;)?_y is a basis of V, and if we put w; = T(v;), then the n- 
element set (w,)''_,; spans T(V): Every # in V may be written uniquely 
as ey, x V;, and by linearity, T(a) = Dey x w;. Theorem 2.64 implies 
dimT(V) < dimV. 

Equality of dimensions holds if and only if (w,) 


<1 1s linearly inde- 
pendent, if and only if no non-trivial linear combination 5> ; ad Ww; is ov, 
if and only if no non-trivial linear combination >; ay, is in ker(T), if 


and only if T' is injective. 


Remark 4.9. In words, applying a linear transformation 7’ cannot in- 
crease dimension, and strictly reduces dimension unless T’ is injective. 
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Examples of Linear Transformations 


Example 4.10. Let V and W be arbitrary vector spaces. The zero 
map oly :V — W, defined by OV (a) = 0" for all « in V, is linear. 


Example 4.11. Let V be an arbitrary vector space. The identity map 
Iv :V — V, defined by Iy(x) = x for all x in V, is linear. 


Example 4.12. If A = [A‘] is an m Xx n real matrix, multiplication 
by A defines a linear transformation 4: R" + R™ via a(x) = Az. 
In components, if x = [a], and (e;)””, is the standard basis of R™, 


then ae 
a(x) = ye Os Aye! ej. 
j=l 


i=l 


Linearity follows from properties of matrix operations, Theorem 1.19: 
a(ca +y) = A(ca + y) = c(Aw) + Ay = cal) + paly). 


Conversely, every linear transformation from R” to R™ can be written 
as matrix multiplication in this way, Theorem 4.26. 


Theorem 4.13. Let (V,+,-) be an arbitrary finite-dimensional vector 
space of dimension n, and let S = (v5) Py be a basis. The mapping wv? : 
V > R” that sends each x in V to its coordinate vector 1° ( 
in R” is an isomorphism. 


ax) = |2| 


Proof. (v° is linear). If x is an arbitrary element of V, then by defini- 
tion, [a]° = [a] if and only if 


n 
e=aly,t-:-+ 2", = y oU;. 
j=l 


If y is an arbitrary element of V with [y]° = [y’], and if c is real, then 


n n n 
ce +y= cy a); + Soy); = ia + y)vj;. 
j=l j=l 


j=l 
In terms of coordinate vectors, 


[ew + y]” = [ew? + y’] = cle") + [y’] = efx]? + [yl”. 
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(v> is bijective). The inverse mapping (05)-1 : R® + V sends an 
ordered n-tuple [27] to the linear combination from S with the corre- 
sponding coefficients, 


(e°)~* ([z7]) = ee 
721 


Existence of an inverse proves bijectivity. 


Jz]? =(<2, 2) [w]® = (2,0) 
ia}* = (-3.3) eat EAR 
fe)/S P21) \ [el &G5=2) 


Figure 4.2: Coordinate vectors in a two-dimensional vector space. 


Remark 4.14. Geometrically, a basis S with n elements defines a co- 
ordinate grid consisting of linear combinations where at least (n — 1) 
coefficients are integers, Figure 4.2; the isomorphism v assigns to each 
location a in V an “address” [a]° in R”. 

Conceptually, the mapping v° and its inverse define a “dictionary” 
between V and R” in the presence of a basis S of V. This dictionary 
depends on S. Material below on “change of basis” tells us how to 
“translate” between the dictionaries associated to two bases. 


Example 4.15. Let V = C®(R,R) be the vector space of smooth 
functions under ordinary addition and scalar multiplication. The fol- 
lowing define linear transformations from V to V (for (i)—(iii)) or from V 
to R (for (iv), (v)). Linearity follows from theorems of calculus; a and b 
denote arbitrary real numbers. 


(i) Differentiation, D(f) = f’. 
(ii 


) Integration from a, In(f) = f f(t) dt. 
(iii) Multiplication by a smooth function ¢, pig(f) = of. 
) 


(iv) Evaluation at a, E,(f) = f(a). 


(v) Integration over [a,b], 12(f) = fo dt. 
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Linear Transformations on the Plane 


Axial scaling, rotation, projection, and reflection are linear transfor- 
mations T : R? > R?. If (€;)5-1 is the standard basis and if (w;)5_4 
is an arbitrary pair of vectors (possibly equal), there exists a unique 
2x2 matrix A satisfying Ae; = wi, and Aeg = we, namely the matrix 
whose columns are w, and wz. Figure 4.3 shows the geometric effect 


on the plane. 
Example 4.16. (Axial scaling) Let a; and ag be real numbers, and 
define ; 
1) 2 1 gq, |a, Of] |x 
Tela?) = (as",an2°) = |f 7 fy. 


Here, w; = aje;. The geometric effect is to “scale” each coordinate 
axis. Axis-aligned rectangles map to axis-aligned rectangles under T, 
but if |a1| A |ag|, “tilted” rectangles do not map to rectangles. 


Example 4.17. Let 6 be real. Rotating the plane counterclockwise 
about the origin maps the standard basis vector e; to w; = (cos 6, sin 8), 
and maps eg to w2 = (—sin@,cos@). Rotation preserves parallelo- 
grams, and is therefore a linear transformation Rotg, given by 


1.2, _ |cos@ —sind gh 7 x! cos — x? sin 
oe ie ae | | i fi sin6+2?cos@ |‘ 


Example 4.18. Let 6 be real, a = (cos@,sin@). Orthogonal pro- 
jection proj, maps e; to w, = cos@(cos6@,sin@), and maps e2 to 
we: = sin @(cos 6,sin@). The matrix of this linear transformation is 


cos2@ cos @sin i iP Es 0 


0 si : 
cos@sin@ sin? @ in | [eos sae é] 
The factorization on the right is an outer product. 


Example 4.19. Let 6 be real, a = (cos@,sin@), at = (—sin@,cos@). 
Reflection across a is defined by 


Refg(a) = @ — 2projgi (a) = —@ + 2 projg(z). 


The matrix of this linear transformation may be found by calculating 
Refg(e1) and Refg(e2), or by using the preceding example: 


-1 0 cos?@  cos@sin@] _ [cos(20) sin(20) 
0 -1 cos@sin@ sin?@ | |sin(20) —cos(26) 
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W2 
1 W 
TA i KS vile 
Wi sal. we 

W2 
Woy 


Figure 4.3: The identity, axial scaling, rotation, and reflection. 


4.2 The Space of Linear Transformations 


Proposition 4.20. [f T, and 7): V — W are linear transformations 
and a is real, the linear combination aT, + T2:V — W defined by 


(aT, + To)(x) =aT\(x)+To(a) for alla inV 
is a linear transformation. 


Proof. If x and y are arbitrary elements of V and c is real, then 


(aT, + To)(ca + y) = aT) (ca + y) + To(ca + y) 

= a[cTy(x) + Ti(y)] + [cTa(a) + Ta(y)| 
claT; (a) + To(a)] + [aTi(y) + Ta(y)| 
c(aT, + T2)(x) + (aT, + T2)(y). 


Remark 4.21. Let V and W be vector spaces. The set £(V, W) of linear 

transformations T’: V + W becomes a vector space under the opera- 

tions of addition and scalar multiplication defined in the proposition. 
If V = R” and W = R”, it turns out that L(V, W) = R™*”. 


Proposition 4.22. Let V be a vector space with basis S = (vq). If 
T:V > W is a linear transformation satisfying T(vq) = 0” for alla, 
then T(x) =O” for all x in V. 


The notation (vq) indicates an arbitrary family of vectors; that is, 
the proposition holds even for infinite-dimensional spaces. 


Proof. Let x be an arbitrary element of V, and write a as a linear 
combination of basis elements. Since a linear combination is finite by 


definition, we have 
n 


=); Tay. 
r= LIV; 


j=l 
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for some basis vectors vj in S and some scalars x/. By linearity, 


n n 


T(«) = 1d vv) =) 2'/T(vj) = S270” = 0". 
= 


i= ‘a 


Corollary 4.23. If S is a basis of (V,+,-), and if T, and T> are linear 
transformations from V to W that agree on S, in that T\(vq) = T2(va) 
for all vq in S, then Ti(x) = To(a) for all x in V. 


Proof. Apply Proposition 4.22 to T, — T9. 


n 
j=l 
a basis, W an arbitrary vector space, and S' = (w;) Fy an arbitrary 
ordered set of vectors in W. 


Theorem 4.24. Let V be an n-dimensional vector space, S = (v;) 


(i) There exists a unique linear transformation T: V > W satisfying 
F(wv;) =wy for j =1,...,n. 


(ii) T is injective if and only if S’ is linearly independent. 
(iii) T is surjective if and only if S’ spans W. 


Remark 4.25. In words, (i) says a linear transformation is uniquely 
specified by its values on a basis. If ordered sets S and S$’ of vectors are 
given as in the theorem, the transformation T is said to be obtained 
from the conditions T(v;) = w; via extension by linearity. 


Proof. (i). Because S is a basis of V, every vector x in V may be written 
uniquely as a linear combination )> ; x/v; from S. If T’ is linear, then 


n n 


a\=) 2 lo) =). aw; 


j=l j=l 


This formula defines a mapping T’: V — W, easily checked to be linear, 
and to satisfy T(vj) = wj. 

Uniqueness is asserted by Corollary 4.23. 

In the remainder of the proof, the preceding formula, which ex- 
presses T'(a) as the general linear combination from S’, is used freely. 

(ii). A vector x is in ker(7’) if and only if OY = T(x) = Yo 2/w,;. 
But T is injective if and only if ker(T) = {0}, if and only if the 
equation OW = So aI wy; has only the trivial solution [x7] = 0”, if and 
only if S’ is linearly independent. 
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(iii). A vector y is in T(V) if and only if there exist scalars x) such 
that y = T(x) = > x w;. But T is surjective if and only if T(V) = W, 
if and only if the equation y = > x w; has a solution [x] for every y 
in W, if and only if S” spans W. 


The Matrix of a Transformation 


We now have enough machinery to express an arbitrary linear transfor- 
mation between finite-dimensional vector spaces in terms of matrices 
of real numbers. 


Theorem 4.26. Let V and W be finite-dimensional vector spaces, and 
assume S = (v;)?_, and S’ = (wi)j", are bases for V and W, respec- 
tively. If T : V + W is a linear transformation, there exist unique 
scalars Ai, FORT Nee oN OO: FS oo, Sten that 


m 
T(v;) =) 0 Ajw; foe = Mees a 
i=1 


Proof. For each j, the vector T'(v;) can be written uniquely as a linear 
combination from S$’. If Aj is the w;-component of T(v;), then the 
equation in the theorem holds. 


Definition 4.27. The m x n matrix [T 12 = [A‘] is called the matrix 
of T with respect to S and S’. 


Remark 4.28. If bases S for V and S’ for W are fixed, this association 
of a matrix to a linear transformation defines a linear isomorphism v e 
from the vector space L(V, W) to the vector space R””*” of m x n real 
matrices. Theorem 4.26 is the analog, for linear transformations, of the 
“coordinatization” in Theorem 4.13 for vectors. 


Example 4.29. Let V be a vector space with basis S = (v;)"_,. The 


j=l 
matrix of the identity transformation is 


De cee 20 
Ook. 4.0240 
Oi Qs oxy aL 


whose jth column encodes the equation Iy(vj;) = vj. 
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Example 4.30. Let V and W be finite-dimensional vector spaces with 
respective bases S = (vj)_) and S’ = (w;)7",. For each pair (7, j) 
with ¢ = 1, ...,m and’ 7-= 1, ....5.7, let E} : V > W be the linear 
transformation defined by E?(vj) = w; and E?(vz) = 0” if k ¥ j; 
that is, E?(v,) = 6{w;. The matrix [Es is e} = eje/, having a 1 in 
the (7,7) entry and 0’s elsewhere. 


Example 4.31. If 7 : R” — R”™ has one-dimensional image, there 
exist non-zero vectors v in R” and w in R™ such that 


T 


T(x) = (v,x2)-w=wv « forall a in R”. 


The standard matrix of T is wv'. The kernel and image of T are 
ker(T’) = Span(v)+ and im(T’) = Span(w). 


Proof. By hypothesis, T(V) is 1-dimensional. Let w be an arbitrary 
non-zero vector in the image. If (e;)?_) is the standard basis of R”, 
then for each j there is a scalar vw such that T(e;) = vw. Put v = [v/]. 


Ife= >i; ze; is an arbitrary vector in R”, then 


‘eee S° xT (e;) = So alu = (v,xz)-w. 
j=l j=} 


Corollary 4.32. In the notation of the theorem, 
[T(x)\* =(T}E[a]® for all x in V. 


Proof. Write Gael = [A‘] and [a]° = [2], so that a = oF, riu,;. We 


want to express T(x) with respect to S’. Substituting, 


Te) = S22T(w)) = 22 Do Alw = SbD Ajo!) die 


The expression in parentheses is precisely the 7th row of the matrix 
product [7]2 [a]°, and is also the wj;-component of T(z), i.e., the 


ith row of [T(x)]*. 
Corollary 4.33. IfT:V => V' andT’: V’ > V" are linear transfor- 


mations between finite-dimensional vector spaces with respective bases 
S, S’, and S”, then 


eae Se ne. 
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Proof. For all x in V, Corollary 4.32 gives 


'T]2 [a]> = (PT (a)|* = (PS (Teg = (1S (TE lel? 


By Corollary 1.23, one = rs, T}2. 


Remark 4.34. In words, evaluation of a linear transformation and com- 
position of transformations correspond to matrix multiplication. These 
results are a basic computational idiom in linear algebra, and justify 
the definition of matrix multiplication. 

The conclusions of Corollaries 4.32 and 4.33 may be expressed, re- 
spectively, as commutative diagrams, collections of spaces and mappings 
in which two compositions with the same domain and target are the 
same mapping. Specifically: 


Be 5 T(x) €eW pe a, Ft ake y" 
| |e he (i> ie 
s! / i ns! 
fa)S eR” 2S, (r(a)s’ eR” pn Me, en law 


Change of Basis 


Definition 4.35. Let V be an n-dimensional vector space with bases 
Sand S$’. The nxn matrix [Iy]2 is called the transition matrix from S 
to S' 


Remark 4.36. By Corollary 4.32, [ae] = v]e [a]° for all a in V. That 
is, the transition matrix “converts” coordinate vectors with respect to S 
into coordinate vectors with respect to S’. 

It may help to think of an element a in V as an object, and to think 
of the coordinate vector [x]* as a description of the object in the lan- 
guage S. The transition matrix v]e is a “dictionary” or “translator” 
from language S to language 9’. 


Corollary 4.37. Let V be an n-dimensional vector space with bases 
Sand." 


(i) The transition matriz fakes is invertible, and the inverse is the 
transition matriz Hare 


(ii) fT: V > V is linear, then [T]% = UIv]§ (TVS UvIs.- 
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Proof. Example 4.29 notes that the matrix of the identity transforma- 
tion with respect to a single basis is the n x n identity matrix I,. By 
Corollary 4.33, 


Wives =Uvlg=t, WIE v8 = v8 = bn. 


Assertion (ii) follows immediately from Corollary 4.33. 


Definition 4.38. Let n be a positive integer, and let A; and Ag be 
nxn matrices. We say Ag is similar to A, if there exists an invertible 
nxn matrix P such that Ay = P~!A,P. 


Remark 4.39. Corollary 4.37 (ii) says that if T is a linear operator on V 
(a linear transformation from V to itself), then the matrices of T with 
respect to arbitrary bases are similar. 


Proposition 4.40. Similarity is an equivalence relation on the set of 
nxn matrices. 


Proof. (Reflexivity). Since the identity matrix J, is its own inverse, 
A=1,Al,=1, Aly. 


(Symmetry). Suppose 4g = P~'A,P. Multiplying on the left by 
(P-1)—! and on the right by P~! gives (P~!)~!ApP~! = Aj. In words, 
if A» is similar to A,, then A, is similar to Ag. 

(Transitivity). Suppose Ag = P~!A,P and A3 = Q~!A2Q. Substi- 
tuting the first into the second, 


A3 = Q7140Q = Q71P714, PQ = (PQ)71A1(PQ), 


so Ag is similar to A. 


Remark 4.41. Detecting whether two n x n matrices are similar or not 
requires work. Chapter 5 is devoted to locating, with the similarity 
class of a matrix A, a “proxy” having a particularly simple structure. 
Two matrices are similar if and only if they have the same proxy. 


Example 4.42. Let V = P3 be the four-dimensional vector space of 
cubic polynomials, S = (1,t,t?,t?) = (vj )j4 the “standard” basis, 
Sa Tee pte peer Pe) = CAra a non-standard basis, 
and D : P3 —> P3 the derivative operator from calculus, D(p) = p’. We 
will find the transition matrices between S and S$’, calculate the matrix 
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of D with respect to S, and use Corollary 4.37 to find the matrix of D 
with respect to S’. 
The jth column of the transition matrix [J], is the coordinate vec- 


‘\ of the jth element of S’ with respect to S. By inspection, 


tor [v; 


it 
1 
1 
1 


The inverse can be computed either by calculating the coordinate vec- 
tors [v,]*, or by using row operations on the preceding matrix. Both 
are shown for illustration. 

To calculate coordinate vectors, set up the vector equations 


vi = vi + v5 + v3 + V1, 
v2 = vi + v5 + V3 + V4; 
/ / / / 
v3 = V1 + V9 + v3 + U4, 
U4 = vi + v5 + v5 + V4. 


In general, each equation becomes a system; here, the system is easy 
to solve by inspection: 


1= _ 1lvjt+ _0 w+ 04+ 0 %, 
$= -1 vit 1 vst _0 v4 _0 vf, 
?= 0 vit —1 vp+ _1 4+ _0 %, 
B= 0 v + 0 vt —1 vg+ _1 v4. 


The coefficients in the jth row are the jth column of ie. 
Alternatively, use the matrix inversion algorithm: 


R,—-Ra, 


Te ee OOS CeO Oise, Oe 10 
0111/0 10 0} m-g JO 10 0/0 1-1 0 
0011/0010 0010/0 0 1-1 
0001/0001 Ora. 1s 40;- a 


The right-hand block is [I ee 
To calculate the standard matrix of D, evaluate D(v;), then express 
the result as a coordinate vector with respect to S. Here, D(1) = 0, 
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D(t) = 1, D(t?) = 2t, and D(t?) = 3t?. The standard coordinate 
vectors may be read off by inspection, giving 


010 0 
be 2 Oo 
Is=15 0 0 3 
000 0 
Finally, []$ (D|S{Z]$ = [D]%,, or 
1-1 OO OO}; J0 1 0 0 11i1éi1 01-1 -l 
Ob SS, 1 00 BO) Oe a) Oe St 
Gc: O:. 2 Sk VO coro. By he Oe a Oe oe. V8 
0 0 0 1] {fo 00 0] lo 001 00 0 0 


The third column signifies that D(v,) = —v{ + 2v4, while the fourth 
column signifies that D(v,) = —v} — v5, + 3v%. 


4.3 The Rank-Nullity Theorem 


Let V be an n-dimensional vector space. The qualitative structure of a 
linear transformation T’: V + W is simple: “Some of the dimensions 
of V are sent to 0”; the rest and mapped injectively”. We prove 
two precise statements of this result, one very concrete, one abstract. 
A third proof strategy, intermediate in abstraction, may be found in 
Exercise 4.15. 


Definition 4.43. Let A = [A‘] be an m X n matrix, with rows (A‘)™, 
in (R”)* and columns (A;)?_) in R™. 

The nullspace of A, Null(A), is the solution space of the homo- 
geneous system Aa = 0”. The nullity of A is the dimension of the 
nullspace of A. 

The column space of A, Col(A), is Span(Aj)7_; C R™. The column 
rank of A is the dimension of the column space of A. 

The row space of A, Row(A), is Span(A’)™, C (R”)*. The row 
rank of A is the dimension of the row space of A. 


Remark 4.44. If wa : R” — R” is multiplication by A, then the 
nullspace of A is the kernel of 44, and the column space of A is the 
image of 4. 
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Lemma 4.45. Jf A is an m x n matriz, E is a product of m x m 
elementary row matrices, and A! = EA, then 


(i) Null(A) = Null(4’). 
(ii) dim Col(A) = dim Col(A’). 
(iii) Row(A) = Row(A’). 
Proof. (i). This restates Proposition 1.38. (Elementary row operations 


do not change the homogeneous solution space. ) 


(ii). Multiplication by FE defines an invertible (hence injective) lin- 
ear transformation fig : R™ + R™ sending Col(A) to Col(A’). Corol- 
lary 4.8 implies dim Col(A) = dim Col(A’). 


(iii). Each elementary row operation replaces a set of rows with a 
set of linear combinations of the rows, so Row(A’) C Row(A). Since 
E~' is a product of elementary matrices, reversing the roles of A and A’ 
gives Row(A) C Row(A’). 


Theorem 4.46 (Rank-Nullity for Matrices). [f A is an m x n matriz, 
(i) dim Col(A) = dim Row(A). 
(ii) dim Null(A) + dim Col(A) = n. 


Proof. (i). Let A’ be the reduced row-echelon form of A, and let r de- 
note the number of non-zero rows of A’, i.e., the number of leading 1’s. 
Since the rows of A’ are linearly independent, dim Row(A’) = r. 

Since every column of A’ has at most r non-zero components, we 
have dim Col(A’) < r. But the r columns containing leading 1’s are 
obviously linearly independent, so dim Col(A’) = r. 

By Lemma 4.45, 


dim Col(A) = dim Col(A’) = r = dim Row(A’) = dim Row(A). 


(ii). By Theorem 2.76, dim Null(A) is the number of free variables, 
namely (n — r), the number of columns not containing a leading 1, 
while dim Col(A) = r is the number of basic variables. The sum of 
these dimensions is 7. 


Definition 4.47. If A is an m xn matrix, the rank of A is the common 
value dim Col(A) = dim Row(A). 
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Proposition 4.48. Let A be anm x n matrix. If the basic variables 
of the homogeneous system Ax = 0” are in columns (kj )io1; the cor- 
responding columns (Ax,)i_, are a basis of Col(A). 


Remark 4.49. That is, one way to find a basis of Col(A) is to row- 
reduce A, then select the columns of A itself corresponding to the 
leading 1’s of the row-echelon matrix. 


Proof. Let B be the m x r matrix whose jth column is Ax. (In words, 
discard all the “free variable” columns of A.) The reduced row-echelon 
form B’ is I, followed by m — r rows of 0’s, so 


dim Col(A) = r = dim Col(B’) = dim Col(B). 


Since B has r columns, its columns are linearly independent, hence 


form a basis of Col(A). 


Quotient Vector Spaces 


Our second proof of the Rank-Nullity Theorem is the direct analog of 
the First Homomorphism Theorem from group theory. 


Definition 4.50. The nullity of T is dim ker(T). The rank is dim T(V). 


Proposition 4.51. Let (V,+,-) be a vector space, K a subspace. The 
binary relation 21 = x2 (mod K) if and only if x2 — a, € K is an 
equivalence relation on V. 


Proof. (Reflexivity). If  € V, then 2 — a = OY € K; by definition, 
x=zax (mod K). 
(Symmetry). If #2; = x2 (mod kK), then a — a, € K. Since 


K is closed under inversion, x1 — 2 = —(a2-— a1) € K, soa = 2 
(mod Kk). 

(Transitivity). If a; = a2 (mod K) and x2 = «3 (mod K), then 
by hypothesis, w2— a, € K and #3— a2 € K. Since K is closed under 
addition, #3 — ©1 = (#3 — #2) + (a2 — a1) € K, proving that a, = x3 
(mod Kk). 


Definition 4.52. Let (V,+,-) be a vector space, K a subspace. For 
each ag in V, the set 


atk ={xinV:x2=2a9+v for some v in Kk} 


is called a coset of K in V. 
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Remark 4.53. The equivalence classes of the relation in Proposition 4.51 
are precisely the cosets of K in V: If @ and ao are arbitrary vectors 
in V, then ap = x (mod K) if and only if x — a € K, if and only if 
zea+k. 


Lemma 4.54. Let (V,+,-) be a vector space, K a subspace. If x, = x 
(mod Kk) and y: = yo (mod Kk), and if c is real, then 


cay + y. =ca2+y2 (mod Kk). 


Proof. By hypothesis, there exist elements a and yo of K such that 
“LQ — £21 = Lp and yo — yj = yo. Thus 


(cag + yo) — (ca, + yi) = c(a2 —- 21) + (yo — yi) = cao + Yo E K. 


Remark 4.55. Lemma 4.54 guarantees that addition and scalar multi- 
plication of cosets is “well-defined”, i.e., depends only on the cosets, 
and not on the representative elements of V. 

The coset K = 0” + K acts as identity element for addition, i.e., 
as the zero vector. The axioms for a vector space are mere formalities, 
amounting to appending “+A” to each of the axioms for V. 


Definition 4.56. Let (V,+,-) be a vector space, K a subspace. The 
set of cosets of kK in V is denoted V/K. For a, y in V and for real c, 
we define 


(a+ KkK)B(y+k)=(e@+y)4+K, cO(a#+K) =(ca)+K. 


The vector space (V//, 4, ©) is called the quotient of V by K. 


Theorem 4.57. If (V,+,-) is a vector space of dimension n, and if 
K is a subspace of dimension k, the quotient (V/K,, GI) is a vector 
space of dimension (n — k). 

Proof. Pick a basis Cha for kK, and extend to a basis S of V by 
appending vectors (5) font Write V’ = Span(vj) Fon y 1 and note 
that V = K @V’. It suffices to show that the cosets (vj + K Poet are 
a basis of V/K. 


(Spanning). If «+ K € V/K, then a can be written as a linear 
combination from S, say 


n k n 
= S- Pais 
LS L°U;i = XV; + I U;. 

j=l j=l j=kt+1 
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Calling the sums xp and 2’, respectively, we have a € K, so that 
x =a’ (mod K), and a’ € V’. As cosets (i.e., as elements of V/K), 
we have 


n 
ge+K=a'+K= > a) (v; + K). 
j=k+1 


k+1 


(Linear independence). Suppose x**", ..., x” are scalars such that 


n n 
oV/K = S~ wi(vj+K) = ( S- viv) +K. 
j=k+1 j=k+1 


The expression 2p in parentheses lies in V’ by definition, and lies in 
because ap + K = OY/*. Since K AV! = {0"}, we have a = OY. 
Since (vj)7_,,, is linearly independent, x = 0 for each j. 


Theorem 4.58. Let V and W be vector spaces. IfT : V + W is a lin- 
ear transformation with kernel kK, there is a vector space isomorphism 
T:V/K >T(V) defined by T(a + K) =T(a). 


Proof. (T is well-defined). If #, + K = #2+ K for some 21, x9 in V, 
then x2— 21 € K = ker(T), so o% = T(x2— 21) = T(x2) —T(#1), or 
T (x1) =T (a2). That is, T(a, + K) =T(aq4+ K). 


(Injectivity). T (a9 + K) = T(ao) = O” if and only if a € K = 
ker(T), if and only if ag + K = 0"/*. 


(Surjectivity). Suppose y € T(V). By hypothesis, there exists an x 
in V such that T(x) = y. By definition, T(a + K) =T(a) = y. 


Corollary 4.59 (The Rank-Nullity Theorem). Suppose V is a finite- 
dimensional vector space, W an arbitrary vector space. IfT : V > W 
is a linear transformation, then 


dim ker(T) + dim T(V) = dim V. 


Proof. Theorem 4.58 says T(V) is isomorphic to V/ker(T). Theo- 
rem 4.57 immediately implies the claim. 


Corollary 4.60. Let V and W be vector spaces of dimension n. A 
linear transformation T : V — W is injective if and only if it 1s surjec- 
tive. 
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Proof. By the Rank-Nullity Theorem, dim V = dim ker(T’)+dim7(V). 
By hypothesis, dimV = dimW. Thus, T is injective if and only if 
dim ker(T’) = 0, if and only if dim7(V) = dimV = dim W, if and only 
if T(V) = W, if and only if T is surjective. 


Remark 4.61. The conclusion of the Rank-Nullity Theorem is true if 
dim V = on, in the sense that at least one of the kernel and the image 
is infinite-dimensional. 


Remark 4.62. Let A be a real m x n matrix, w4 : R” — R”™ the 
multiplication map. 

The dimension of R” is the number of columns of A. 

The kernel of w,4 is the solution space of the homogeneous system 
Aa = 0”, i.e., the number of free variables after the coefficient matrix 
is put into row-echelon form. 

A vector b in R” is in the image of wy if and only if Aw = b is con- 
sistent. The solution set is a coset of the homogeneous solution space. 
This recapitulates our earlier result that an arbitrary non-homogeneous 
solution x is the sum of any particular solution ap and a homogeneous 
solution 2p. 

By the Rank-Nullity Theorem, the dimension of the image is the 
number of columns minus the number of free variables, i.e., the number 
of basic variables. 


Exercises 


Exercise 4.1. In the vector space P3 of cubic polynomials, consider 


the basis S” = (1,1+4,1+t+ 57, 1+t+ 97+ gt) = (vf)4_1. Use the 
techniques (and results, as appropriate) of Example 4.42 to calculate 


[DI&r. 


Exercise 4.2. Let V be the vector space P2 of quadratic polynomials, 
and let W = R?, both with the usual vector space operations. Define 
T:V + W by T(p) = (p(-1), (0), p(1)). 


a) Show directly that T is linear. 


( 
(b) Find the matrix of T with respect to the standard bases. 


(c) Determine (with justification) whether or not T is an isomorphism. 


) 
) 
) 
) 


(d) If possible, solve T(p) = (1, 2,3) for p. 
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Exercise 4.3. Each part refers to the matrix 


—-1 2-3 
A= |-2 4 -6 
LL =2 3 


and the associated linear transformaton p14 : R? > R®. 


(a) Use row reduction to find a basis for ker(j14), and for im(jr4). 
Suggestion: Use Rank-Nullity to find dim im(j14). 


(b) Find vectors v and w in R? such that A = wv", and verify that 
ker(14) = Span(v)+ and im(4) = Span(w). 


Exercise 4.4. Let S = (el, e7, e4, e%) be the standard basis of R?*?. 
Calculate the matrices with respect to S' of (a) The transpose operator; 
(b) The operator T(A) = A+ A!; (c) The operator T(A) = A— AT. 


Exercise 4.5. Let V C F(R,R) be the subspace spanned by the set 
S' = (cos, sin) = (v;)5 4: 


(a) Show that differentiation defines an operator D : V > V, and find 
the matrix J = [D]%. 


(b) Calculate J?, the square of the matrix you found in (a), and inter- 
pret the result in terms of second derivatives of cos and sin. 


Exercise 4.6. Let n be a positive integer. 


(a) Prove that P,, is isomorphic (as a vector space) to R"+!. 
Suggestion: There is an “obvious” way to map a polynomial of 
degree at most n to an ordered tuple of coefficients. 


(b) Let to be a real number, and define the “evaluation functional” 
Et, : Pn + R by €4)(p) = p(to). Use the Rank-Nullity Theorem to 
prove that ker(T’) = {p in P,, : p(to) = 0} is n-dimensional. 


(c) In the notation of part (b), show that S = ((t—to)t4) is a basis 
of ker(T). 


Exercise 4.7. Let a, 6, and c be real numbers, not all 0, 


0 c —b 
A= |{-c 0 af, 
b6 -—a 0 
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and let 44: R® > R® be multiplication by A. 
(a) Show that (Ax, y) = — (a, Ay) for all x, y in R®. 
Hint: Recall that (x,y) = a! y. 


(b) Find the rank of jy. 
Suggestion: Split into cases (i) a 4 0; (ii) a = 0 and b ¥ 0; and 
(ii) a= b= 0, Ee 0. 


(c) Find the nullity of ~4, give a basis for the kernel, and use part (a) 
to show that R® is the orthogonal direct sum ker(j14) @ im(1). 


Exercise 4.8. Prove from the definition that a composition of linear 
transformations is linear. 


Exercise 4.9. Let T: V 3 V’ and T’ : V’ = V" be linear transfor- 
mations. Prove: 


(a) ker(T’) C ker(T’T). (b) im(T’T) C im(T"). 


Exercise 4.10. Suppose A, B, and P are n x n matrices, and that 
B= P-'AP. Prove by mathematical induction that if N is a positive 
integer, then BY = P~!ANP. (Note carefully that this is not what 
you might expect by “distributing” the exponent!) 


Exercise 4.11. Let a, and a2 be real numbers, and define 


_ {ay ag _ fl oe pe! 
a-[eal =} 2 [: 


(a) Show Av, = (ai + a2)v1. Establish a similar formula for Avg. 


(b) Let S = (e1,e2) be the standard basis of R?, and S’ = (vj, v2). 
Calculate the transition matrices [J ie and [I]2,. 


(c) Let T : R? — R? be the linear transformation whose standard 
matrix [T]2 is A. Find the matrix ralee 


(d) If N is a positive integer, find a formula for the entries of AN. 
Suggestion: Use Exercise 4.10. 


Exercise 4.12. Let o be a permutation (i.e., bijection) of the set 


{1,2,...,n}, and let P, be the associated permutation matrix, see Ex- 
ercise 1.13. If A = diag/\!, 7,..., "| for some numbers ’, calculate 
P,AP.* 
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Exercise 4.13. In each part, let V be the space of smooth (infinitely- 
differentiable) real-valued functions on R, let Vo be the subspace of 
functions f satisfying f(0) = 0, and let D and S denote the differenti- 
ation and integration operators 


(DA=f'), (SAU) = | f(s) ds. 


Use theorems from calculus to justify your answer in each part. (Exer- 
cise 4.9 may also be helpful.) 


(a) Calculate the compositions DoS and So D. Is either composition 


the identity map Iy? 
b 


(b) Show S is injective, and find the image. 
(c) Show D is surjective, and find the kernel. 
(d) 


d) Does S map Vo to itself? If so, is S : V9 > Vo an isomorphism? 


(ce) Does D map Vo to itself? If so, is D : Vp — Vo an isomorphism? 


Exercise 4.14. Let ¢ be a positive real number. A function f: R > R 
is ¢-periodic if f(t + ¢) = f(t) for all real t. With the notation of the 
preceding exercise: 


(a) Prove that the space of ¢-periodic functions is a vector subspace 
of F(R,R). (Consequently, the set W of smooth ¢-periodic func- 
tions is a subspace of V.) 


(b) Show that D maps W to W. (That is, the derivative of a smooth, 
¢-periodic function is ¢-periodic.) 


(c) Show that if f € W, then Sf € W if and only if fy f(t) dt =0. 
Exercise 4.15. Let V be a finite-dimensional vector space with basis 


S = (vj), let T : V + W be a linear transformation, and put 


w; = T(v;) for each 7. Prove from the definitions (not using results 
proven in this chapter) that: 


(a) The set T(S) = (w;)%_, spans the image T(V). 


(b) If (v;)P oy is a basis for ker(7’), then the set (w,)/_;,,4 is a basis 


of the image T(V). 
(c) Use part (b) to deduce the Rank-Nullity Theorem (Corollary 4.59). 
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Diagonalization 


5.1 Linear Operators 


If (V,+,-) is a vector space, a linear transformation T’ : V > V is called 
a linear operator (on V). If V is finite-dimensional, and if S and S’ 
are bases, an operator J’ may be expressed as a matrix with respect to 
either basis, and the two matrices are similar: 


(T]2 =[Iv]3 (TeUvls, or A’ = P7'AP. 


Given a linear operator T, we would like to find a basis S’ with respect 
to which the matrix A’ of T is “as simple as possible”.* For example, we 
might want A’ to be a diagonal matrix or an upper triangular matrix. 
Can we arrange this, and if so, how can we calculate a basis S” for a 
specific operator? 


Remark 5.1. If A’ were a scalar matria, i.e., if A’ = cIn for some real 
number c, the condition A’ = P~!AP would imply 
A=PA'P' = P(el,)P'! =cln, 


see Theorem 1.19 (iii). Consequently, every matrix similar to a scalar 
matrix is a scalar matrix. 


Definition 5.2. Let (V,+,-) be an n-dimensional real vector space. A 
linear operator T : V =e V is diagonalizable if there exists a basis S’ 
of V such that A’ = [T]%, is diagonal. 


“The technical term is a canonical form for T, or for A. A matrix does not have 
a unique “canonical form”; the notion of “simplest” depends on one’s mathematical 
intent. 


95 
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The following characterization is immediate from the definition, but 
stated formally for emphasis. 


Proposition 5.3. Let (V,+,-) be an n-dimensional real vector space, 


a (v5) 4 a basis of V, and T : V — V a linear operator. The 


matriz A’ = pa is diagonal if and only if there exist scalars (A7)P_y 
such that T(v;) = ’v; for each j. 


Remark 5.4. In the notation of the proposition, 


Mee 30. 2ce0 50 

Bie Gethiees th ntuhite OA Sus. D 
[TG = diag[A*, A*,..., A] = diag[A’] = ; 

0 O Ale 


5.2 Eigenvalues and Eigenvectors 


Definition 5.5. Let T: V — V bea linear operator. A real number \ 
is an eigenvalue of T if the operator (T’— Aly) is not invertible. 

A non-zero vector v in V is a A-eigenvector of T if T(v) = Av. 

If \ is an eigenvalue of 7’, the subspace 


Ey =ker(T — Aly) = {vin V: T(v) = Av} 
is the A-ezgenspace of T. 


Remark 5.6. If A is an eigenvalue of T’, the A-eigenspace consists of all 
A-eigenvectors together with the zero vector 0" , which by definition is 
not an eigenvector but does satisfy T(v) = Av. 

Eigenspaces of an operator are “invariants”, depending only on the 
operator, not on any choice of basis of V. 

Every linear operator on a 1-dimensional space is scalar, hence di- 
agonal with respect to an arbitrary basis. 


Remark 5.7. Proposition 5.3 says JT is diagonalizable if and only if 
V has a basis consisting entirely of eigenvectors of T’. For brevity, such 
a basis is called a T-ezgenbasis of V. 


Example 5.8. If A = diag[\!, \?,...,”], then every standard basis 
vector is an eigenvector: Ae; = Ne;. If the \/ are distinct, the eigen- 
spaces of A are precisely the coordinate axes. If there are repetitions 
among the diagonal entries, the standard basis vectors corresponding 
to a single eigenvalue \ span the A-eigenspace. 
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Example 5.9. A linear operator T has eigenvalue 0 if and only if T is 
not invertible, and Eo = ker(T). 


Example 5.10. Let a and b be real numbers, and let T : R? > R? 
have standard matrix 


ing-4=[5 f. 


The vector vj; = (1,1) is easily checked to be an (a + b)-eigenvector, 
while vg = (1, —1) is an (a — b)-eigenvector. It follows that T is diag- 
onalizable, and if S’ = (v;)5_1, then T}s, = A’ = diagla + b,a — Bj. 
(Checking this by direct computation is worthwhile, see Exercise 4.11.) 

If b £ 0, the eigenvalues a + b of T are distinct, and there are two 


1-dimensional eigenspaces: 
Ea+b = Span(v1), Eq—p = Span(v2). 
If b = 0, there is one eigenvalue, a, and Ey = R?. 


Example 5.11. Let P denote the space of polynomials in one variable, 
and D the derivative operator. 

A non-zero constant polynomial is a 0-eigenvector for D. There are 
no other eigenvectors in P: If p is a polynomial of degree n > 1, then 
D(p) = p’ is a non-zero polynomial of degree n — 1 > 0, while every 
scalar multiple of p is either zero or of degree n. 

The operator T(p)(t) = tp'(t) on P has each positive integer k as 
eigenvalue, since T(t*) = t(kt*~!) = kt*. Since these eigenvectors 
span P, it turns out there are no other eigenvalues or eigenspaces. 


Remark 5.12. If T': V — V is an operator and c is real, then 2 is an 
eigenvalue of T if and only if (A — c) is an eigenvalue of T — cly. 


Theorem 5.13. Let (V,+,-) be an n-dimensional real vector space, 
T:V —->V a knear operator. 


(i) If X is an eigenvalue of T, then dim E) > 1. 


(ii) If > A? >--- > Xd" are eigenvalues of T, and if S = (Vj )ja1 is 


a set of corresponding eigenvectors, t.e., if T(vj) = Nv; for each 7, 
then S is linearly independent. 


(iii) T is diagonalizable if and only if its eigenspaces span V. 
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Proof. (i). If X is an eigenvalue of T, then T — Aly is not invertible. 
By Corollary 4.60, T — Aly is not injective, so dimker(T — Aly) > 1. 
(Concretely, there is a non-zero vector v such that (T —AIy)(v) = 0”, 
i.e., T(v) = Av.) 


(ii). The proof is by induction on r. The case r = 1 is obvious: 
an eigenvector is non-zero by definition, so a set of one eigenvector is 
linearly independent. Assume inductively for some k > 1 that the set 
(v;)F oy of eigenvectors is linearly independent, and suppose 


for some scalars x7. Applying the operator T—A*+1 J, to the preceding 
equation gives 


k 
oY = ra OG _ MN ay, ner ph (N 7 M+ ay, af So ah _ M+) y,., 
j=l 


The right-hand sum is a linear combination from a linearly independent 
set, so the coefficients 2/(\7 — \**1) are all 0. By hypothesis, the 
eigenvalues are decreasing with index, so 0 < (AJ — \**1); thus 27 = 0 
for 1 <j <k. Substituting into (*), we find x*+! = 0 as well; that is, 


(Ft is linearly independent. 


(iii). As noted above, T is diagonalizable if and only if there exists 
a T-eigenbasis of V. 

Suppose V has a TJ-eigenbasis 9’. Since every eigenvector of T lies 
in some eigenspace, S’ C (), Ey, and therefore 


V = Span(S") C Span (U Bs) = QD £). 
» » 


(The final step is Theorem 2.44; the sum is direct by (ii) just proven.) 
Conversely, assume the eigenspaces of 7’ span V. For each eigen- 
value \ of T, pick a basis S) = (ws), of Ey. It suffices to prove the 
union of these sets, S = LU, Sy, is a basis of V. 
Since Span(.S)) = Ey, Theorem 2.44 gives 


Span(S) = DE, = Ve 
» 
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To prove S is linearly independent, suppose some linear combina- 
tion from $' is equal to 0”. Indexing coefficients (and grouping terms) 
according to the corresponding eigenvalue, 


n 
ov = m1 02 ov) ), 
A \j=l 


This equation has the form 0” = Ss v*, with v* in Ey. By (ii), 
v® = OV; if any v* were non-zero, the non-trivial linear combination 


v*® would be non-zero. Fix an eigenvalue arbitrarily. Since 
» 


and SS), = (ws)ir , is linearly independent, all the scalars xh are Zero; 


only the trivial linear combination from S' gives the zero vector. 


Corollary 5.14. Jf dimV =n and T has n distinct real eigenvalues, 
then T 1s diagonalizable. 


Proof. For each eigenvalue » there exists a \-eigenvector by (i); these 
n vectors are linearly independent by (ii), so are a basis of V. 


Example 5.15. If v and w are arbitrary non-zero elements of R”, there 
is a rank-one operator T(x) = (v,x)- w, with standard matrix wv", 
see Example 4.31. 

If (v,a) = 0, then T(x) = 0”; that is, the 0-eigenspace contains 
Span(v+), which has dimension (n — 1). (The 0-eigenspace is not R”, 
because wu! 4 0”"*", so in fact Ey = Span(vt).) 

If x is an eigenvector of T and (v,a) 4 0, then (v,x)- w= Ax. It 
follows that « = cw with c 4 0, in which case \ = (v, w). 

If (v,w) #0, then Ey = Span(w). Consequently, R” = Eo @ Ey, 
and T is diagonalizable. If instead (v,w) = 0, then all n eigenvalues 
of T are 0 but T 4 0, so T is not diagonalizable. 


Involutions 


Definition 5.16. Let V be an arbitrary vector space (possibly infinite- 
dimensional). An operator T satisfying T? = Iv is an involution of V. 


Proposition 5.17. [fT is an involution, the eigenspaces of T span V. 
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Proof. If v is an arbitrary element of V, then the vectors 
vi = 5(v+T(v)), vy t= 5(v—T(v)), 


are eigenvectors of T’ with eigenvalue 1 and —1, respectively (compare 
the proof of Proposition 2.49). Moreover, v = v'+v!; that is, the 
eigenspaces of TJ’ span V. 


Remark 5.18. The eigenvalues of an involution are, a priori, +1, since 
if v is an eigenvector, then 


v=T*(v) = T(T(v)) =TAv) =AT(v) = dv. 


Suppose “optimistically” that every vector decomposes as a sum of 
eigenvectors: v = v'+v7!. Applying T gives T(v) = v' — vT!. 


“Solving” for v! and v—! gives the formulas in the preceding proof. 


Example 5.19. Let V = R”*”. The transpose operator T(A) = A! is 
an involution (the transpose of a transpose is the original matrix). Its 
eigenspaces are H, = Sym” (the symmetric matrices) and E_; = Skew” 
(the skew-symmetric matrices), compare Proposition 2.49. 


5.3. The Characteristic Polynomial 


Throughout this section, (V,+,-) is a finite-dimensional real vector 
space of dimension n, and T’: V > V is a linear operator. 


Proposition 5.20. If S and S' are bases of V, and if A = [T]2 and 


A= cae are the matrices of T in these bases, then det A’ = det A. 


Proof. By Corollary 4.37 (ii), the matrices A and A’ are similar. By 
Corollary 3.57, similar matrices have the same determinant. 


Definition 5.21. The determinant of T is the determinant of the ma- 
trix of T with respect to an arbitrary basis of V. 
The characteristic polynomial of T is 


yr (A) = det(T — Aly) = det T + ---+(—A)”, 


of degree n. The characteristic equation of T is yr(A) = 0. 
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Remark 5.22. Each summand in a determinant is a signed product of 
n matrix entries. Since each entry in the matrix of T’— Aly is either a 
number or a linear polynomial in A, the determinant has degree n. The 
only term of degree n comes from the product of the diagonal entries 
(Ay — A), so the coefficient of \” is (—1)”. The constant term is found 
by setting A = 0. 


Proposition 5.23. A real number X is an eigenvalue of T if and only 
if xr(A) = 0. 


Proof. A real number 4 is an eigenvalue of T if and only if T — Aly is 
not invertible, if and only if y7(A) = det(T’ — Aly) = 0. 


Remark 5.24. By the Fundamental Theorem of Algebra, the charac- 
teristic polynomial of T’ has precisely n complex roots, counting mul- 
tiplicity. For terminological convenience, we extend the definition of 
“eigenvalue” to encompass non-real roots. 


Definition 5.25. Let (V,+,-) be a finite-dimensional real vector space, 
and T.: V > V a linear operator. A root of the characteristic polyno- 
mial of T is an eigenvalue of T. 


Example 5.26. Let a and b be real, and consider the matrix 


a—X2 b 


a b 
A= ] ) xA(A) _ det(A — Alg) = b aad 


| = (a—)? — 8, 


a difference of squares: y 4(A) = (a+b—A)(a—b—A). We recover the 
eigenvalues \ = a+b. (Computing corresponding eigenvectors by sub- 
stituting each eigenvalue in turn and solving the resulting homogeneous 
system is a worthwhile exercise.) 


Example 5.27. Let a and 6 be real, and consider the matrix 


a —bd a—-X —bd 
B= ; > xB(A) = det(B-Aly) = h eee 


| = (a—A)?+0". 


The characteristic equation of B is \? — 2a\ + (a? + b?) = 0. The 
quadratic formula gives 


2a + \/(2a)? — 4(a? + 2) 
i 2 


That is, if b 4 0, the roots of the characteristic equation are non-real. 
Because B has no real eigenvalues, B has no real eigenvectors. It follows 
that B is not diagonalizable over the real numbers. 


=~atVJV-—b? =atbhi. 
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Remark 5.28. The matrix B of the preceding example defines a linear 
operator on the complex vector space C?, the set of ordered pairs of 
complex numbers equipped with complex scalar multiplication. In this 
space, B has two linearly independent eigenvectors: v, = (1,—7) and 
va = (1,7). For example, 


a —b 1 a+ bh 1 
Buy, = | i = Fi = : = (a+ bi) E = (a+ bi)v1. 
Thus, B is diagonalizable over the complex numbers. 


Example 5.29. Let a and b 0 be real, and consider the matrix 


a bd a—X b 
c= (fF As xc(A) = det(C — AIg) = 0 gy 


[= (a-aP 


The only real eigenvalue of C' is a, of “algebraic multiplicity 2”. (That 
is, A= a is a double root of the characteristic equation. ) 

To find the corresponding eigenspace, we solve the homogeneous 
system with coefficient matrix C — alz = be?. Multiplying the first row 
by 1/b puts the coefficient matrix into reduced row-echelon form. The 
variable x? is basic, and z! is free. The only eigenvectors are therefore 
scalar multiples of v, = (1,0), ie., 


E, = {(x', 0) in R?}. 


The eigenspace does not span R?, so C is not diagonalizable. Because 
dim Eq = 1, we say the eigenvalue a has “geometric multiplicity 1”. 


Remark 5.30. The preceding three examples typify the general eigen- 
space behavior of a real n x n matrix A. In the “best” case, A has 
all real eigenvalues and a set of eigenvectors that span R”, so A is 
diagonalizable. 

Two things can prevent A from being diagonalizable over the reals. 
First, some of the eigenvalues may be non-real. (Because the charac- 
teristic polynomial of a real matrix has real coefficients, any non-real 
eigenvalues occur in complex conjugate pairs. A matrix with non-real 
eigenvalues may be diagonalizable over the complex numbers, i.e., there 
may exist an invertible complex n xn matrix P such that P~!AP is di- 
agonal.) Second, A may have only real eigenvalues, but its eigenspaces 
may fail to span R”. 
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Example 5.31. Find the eigenvalues and eigenspaces of the matrix 


Re ih est 
At ails 
1 0, -3 


and determine whether A is diagonalizable. If so, find a matrix P so 
that A’ = P~!AP is diagonal with the eigenvalues in non-increasing 
order. 

Subtract AJ3, then expand the determinant using the formula from 
Example 3.47: 


— 
— 
| 
~ 
re 


det(A — AI3) 


= d) [G1 - d) +1] 
=o - 
a oe ‘Te a 1)-(1-A)] 
= (1—A)[\* - 4\ +4] + [A-2] - [A-2] 
=-(-1DA- 2), 


so the eigenvalues (in non-increasing order) are 2, 2, and 1. 
To find a basis for E2, row-reduce A — 2/3: 


-1 1-1 ae 0 0 0 

Aree ical ae gy 44 Ee Nat 9 

1-1 1 0 0 0 
The system may be written «! = x? — x?, so the general solution is 
ay r? — 2 1 —1 1 -1 
| = x =7?11| +2° O); w= 41}, v= 0 
x x 0 1 0 1 


Theorem 5.13 implies A is diagonalizable: We have shown dim(F2) = 2, 
and dim(E,) > 1, so the eigenspaces of A span R?. To find a 1- 
eigenvector, row-reduce A — 3: 


Ok a1) 0 ot Pa cae 
5 ey bee A mae ra ay 
Pan: & 00 0 00 O 
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The system is written 2! = —a? and 2? = 2°, so 
x —2x =] —1 
Lo) = | =2° 1}; v= 1 
x x 1 1 


To catch arithmetic mistakes, it’s a good idea to verify by matrix mul- 
tiplication that Av, = 2v1, Ava = 2ve2, and Av3 = v3. 

The eigenbasis S$’ = (vj) $4 diagonalizes. The transition matrix 
P= [13] 2, has jth column [v,;]°. To compute the inverse, P~! = [13], 
form the augmented matrix [P | T3| and row-reduce: 


Ri+R3, 
1 -1 -1/1 0 0 eee 1 0 0 1 0 1 
1 0 1/0 1 O} “>? Jo 0 1/-1 1 -1); 
0 1 1;0 0 1 0 1 0 1 -l 2 
swapping the second and third rows gives 
1 0 1 1 -1 -l 
PAS NY os she, P=|1 0 1 
-1 1 -l 0 1 1 


Again to guard against arithmetic errors, it’s a good idea to check that 
P-!P = 13. On theoretical grounds, diag[2, 2,1] = A’ = P~'AP, ice, 


2 0 0 Eee OF. SE ot cs aD a 
OZ 0) = Bae C2 it ah ie > 0 a 
0 0 1 sale A) SSE ae? ee NO. Ge ad 


For good measure, check this by multiplying matrices. 


Example 5.32. Let k > 2 be an integer and X real. The k x k matrix 


XA Oak 0 

Rat Or 1 0 0 

B,(A) = In +S eget? = are 
j=l 0 0 0 vA 1 

0 0 O OX» 


is called the Jordan block of size k with eigenvalue A. The matrix 
Ny = By (0) is nilpotent: Nz 4 OF** for 1 <j < k, but NE = 0F**, 
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The eigenvalue of B,(A) is A, with algebraic multiplicity k. The 
A-eigenspace is easily checked to be Span(e1), so \ has geometric mul- 
tiplicity 1. In particular, B,(A) is not diagonalizable. 

In more advanced courses, it is proven that if V is finite-dimensional 
and T' is an operator with real eigenvalues, then V splits as a direct 
sum of subspaces in such a way that T acts as a Jordan block on each 
subspace. 


5.4 Symmetric Operators 


Let N be a positive integer. Throughout this section, V is a subspace 
of (R‘,+,-), and is equipped with the inner product coming from the 
Euclidean dot product, (x, y) = a! y. 


Definition 5.33. An operator T : V > V is symmetric if 
(T(x), y) =(x,T(y)) for all x and yin V. 
Remark 5.34. If (vj)7_) is a basis of V, T is symmetric if and only if 
(P(v;), 07) = (un, T(03)) for alla, fj =1,..., 7. 


Symmetry implies this condition a fortiori. Conversely, if this condition 
holds, then writing « = }°, x’v; and y = Sy y/v; and using bilinearity 
of the dot product shows Tis symmetric. 

Example 5.35. Let V = R”, and assume Ta) = Az for all x in V; 
that is, A is the standard matrix of 7. The operator T is symmetric if 
and only if 


a’ Aly = (Ax)'y = (Aw,y) = (@, Ay) = a" Ay 
for all x and y in R”, if and only if A’ = A. 


Proposition 5.36. If W is a subspace of (R”,+,-), the orthogonal 
projection map projw : R"” > R” is symmetric. 


Proof. Every vector # in R” decomposes uniquely as @ = xy + Ly, 
and the orthogonal projection operator is defined by projy(a) = xy. 
Ife=a2yt+ ey. and y= yw + yy. are arbitrary vectors, then 


(projw (x), y) = (fw, yw + Yw.) 
| (zw, yw) 
= (tw + yi, yw) = (@, projy(y)). 
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Proposition 5.37. If Ti and T> are symmetric, and if c is real, then 
cI, + Ty is symmetric. 


Proof. Exercise 5.3. 


Remark 5.38. In particular, the set of symmetric operators is a vector 
subspace of the set £L(V,V) of all operators. Further, an arbitrary 
linear combination of orthogonal projections is symmetric; the Spectral 
Theorem, below, asserts the converse. 


Proposition 5.39. Let S’ = (w;) fy be an orthonormal basis of V. If 
ae is an operator on V, then T is symmetric if and only if the matrix 
Ee le is a symmetric matrix. 


Proof. The matrix A = [A‘] of T with respect to S’ is defined by 
n 
T(uj) = So Alu. 
k=1 
Since (uj, Uz) = Oj, we have 
n n , 
(T (ty), ti) = $0 AP (up, ui) = So AFOin = A’. 
k=1 k=1 


Exchanging the roles of i and j gives (u;,T'(ui)) = (I'(ui), uj) = Al, 
The operator T is symmetric if and only if 


Al = (T(uj), ui) = (uj, T(ui)) = A} for all i and J, 


if and only if [A‘] = |T] 2) is a symmetric matrix. 


Remark 5.40. If S’ = (V5) Fo1 is an arbitrary basis and T' is symmetric, 
the expression 
n 
YA} (oe, 0) 
k=1 


is symmetric in 7 and j, but [A‘] =e I may not be symmetric. 
Theorem 5.41. Let A be a real, symmetric n x n matrix. 
(i) Every eigenvalue of A is real. 


(ii) Distinct eigenspaces of A are orthogonal. 
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Proof. (i). Let \ = a +i be an eigenvalue of A. It follows that the 
complex conjugate \ = a—7/ is also an eigenvalue of A. Consequently, 
the real matrix 
A =(A-—XIn)(A— XI) = A? — 2A + (7 + B?)In 
= (A—aln)* + (8In)? 
is not invertible. Since A — al, is symmetric, if x is a non-zero vector 
in the nullspace of A, then 
0 = (Ag, x) = ((A- aln)*x, x) + ((BIn)?x, a) 
= ((A-aln)az, (A — aln)x) + (Bax, Bx) 
= ||(A — ofn)al|? + || Bar|’. 


By positive definiteness, @ = 0 and Ax = ag. 


(ii). If\! A \? are eigenvalues of A, with corresponding eigenvectors 
uw, and ue, then 


(Al uy, U2) = (Auj, U2) = (u1, Au2) = (u1, uz) , 


or (A! — \?) (wu, u2) = 0. Since A! — A? # 0, we have (uz, w2) = 0. 


The Spectral Theorem 


Theorem 5.42. Let V be a subspace of (R‘,+,-). IfT :V > V is 
a symmetric linear operator, there exists an orthonormal T’-eigenbasis 
of V. 


Proof. The proof proceeds by induction on the dimension of V. If 
dim V = 1, then T is a scalar operator, and an arbitrary unit vector 
in V is an orthonormal eigenbasis. 

Assume inductively for some k > 1 that the conclusion of the theo- 
rem holds for every subspace of dimension k, and let V be a subspace 
of dimension (k + 1). Since T is symmetric, its eigenvalues are real. 
Pick an eigenvalue \**+1 and a unit \**!-eigenvector ugz41 in V, and 
let V’ = Span(uz41)+ be the orthogonal complement of upz+1. 

We claim T maps V’ to V’: A vector y is in V’ if and only if 
(Ur+1,Y) = 0. Since T is symmetric and uz4, is an eigenvector of T, 


(ursi, TD (y)) = (T(ursi), y) = (Aves, y) = 9; 
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that is, T(y) € V’. Since dim V’ = k, the inductive hypothesis guaran- 
tees there exists an orthonormal T-eigenbasis (u;)F 4 of V’. Append- 


ing Up41 gives an orthonormal T-eigenbasis Cea of V. 


Corollary 5.43. If A is a symmetric n x n real matria, there exists an 
orthogonal n x n matrix P such that P'AP is diagonal. 


Theorem 5.42 has a geometric formulation: 


Theorem 5.44. Let V be a subspace of (Ress, ‘) IT: VoV isa 
symmetric operator, \' > \2 > --- >" are the eigenvalues of T, and 
Il; : V — V ts the orthogonal projection to the eigenspace E);, then: 


(i) T =), HT. 
(ii) Iv = 30, Th. 
Remark 5.45. (ii) recapitulates Theorem 3.25. 


Example 5.46. (Symmetric rank-one operators on R”) Recall that a 
rank-one operator T on R” has standard matrix A = wv! for some 
non-zero vectors v and w in R”. 

The matrix A is symmetric if and only if 


wyl =A=Al= vw!, 


if and only if v’w? — viw' = 0 for all i, j, if and only if v and w are 
proportional. In this event, the eigenspaces are Ey = Span(v)+, of 
dimension (n — 1), and E},,)2 = Span(v), of dimension 1. 

In agreement with Theorem 5.44, T is ||v||? times orthogonal pro- 
jection to Span(v). 


5.5 Applications 


For computing powers of a square matrix, diagonal matrices are for- 
mally simpler than general matrices. This section presents a selection 
of examples. Key properties are gathered here for reference; each is 
easily established with mathematical induction. 


Proposition 5.47. Letn > 1 be an integer. If A, B, and P andnxn 
matrices, with P invertible, then for allm > 1: 


(i) If A’ = PAP, then (A')™ = P“1A™P. 
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(ii) If AB = BA, then (AB)™ = A™B™, 


(iii) If (d?)ny are real numbers, and if A’! = diag|d',d?,...,d"], then 
(A’)™ = diag|(d')™, (d?)™,...,(d")™]. 


The Fibonacci Numbers 


The Fibonacci sequence is the real sequence (x™)°°_, defined by the 
two-term recursion 


ci =2?=1, gins A gl foray So, 


The first fifteen terms are 1, 1, 2, 3, 5, 8, 18, 21, 34, 55, 89, 144, 
233, 377, 610 ..... An obvious inductive argument proves that each 
Fibonacci number is an integer. There is a remarkable closed formula, 
known to De Moivre but usually called “Binet’s formula”. 


Theorem 5.48. Let \t = 5(1+ V5) and \X~ = 5(1—v5). horm 21, 


ym tm =O" _ (1+ vB)" = = v5) 


v5 am/5 


Proof. For m > 1, define the vector #” = (2™,2™t!) in R?. The 
Fibonacci recursion relation may be expressed in matrix form as 


gmt gintl 01 gm 
ee an _ Pere 2 fi i ean — Ag 


with A denoting the indicated 2 x 2 matrix. By induction on m, 


m m-1 
| So = Ag = 1 H for all m > 1. 


1 1 1 


To calculate A”~! in closed form, diagonalize A. The characteristic 
equation is 


—. Ht 


0 = det(a 1) =| oe ee 


| =) —)-1. 

The quadratic formula gives the eigenvalues of A: \t = 5(1 + V5) and 
i = 5(1 — /5). The subsequent computation is done symbolically, 
using the identities 


AT+A7T=H1, AFT-N HVS, MAT H-1, CAFP HAF 41. 


AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31 


110 LINEAR ALGEBRA 


Corresponding eigenvectors are 


ee 


As guaranteed by Theorem 5.41, the eigenvalues of A are real and the 
corresponding eigenvectors are orthogonal. 

Let S = (€5)5-4 be the standard basis of R? and 9’ = (vj) 4 the 
A-eigenbasis. The transition matrix P = [12] 2, and its inverse are 


=e 2 aes de (eB ee 
Peli: Pasa e| oe cal “yg | a+ al: 


(Note that P~! 4 P'; the columns of P are not of unit length.) 
If T : R? > R? is the operator whose standard matrix is A, then 


A= [x] = (018 = tals tristal§ = Ptar 


Writing A = PA’P7! = diag[\*, A~] and remembering \*+A~ = —1, 
Ami = Pi ANe-tp rt 

eb iis. ae eee 0 —r~- 1 

SAE Dee 0 Oars At -1 


Cp ane Oe at ee 
DOP =0n OrOe)" |r 


Figure 5.1: The action of TJ on S, and on an orthonormal eigenbasis. 
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Consequently, 


7 a de? = a ee eyes = Or - H 

= J d ie 1 —() ie 1 (Oan ueee Oa ae 1 

7 ale (At)™—2 4 (tym) = ((A ee 2 fe (A ie 7 
5 aes ule One) = (nye an (A-)™) 


In particular, since 1 + A+ = (A*)?, 
Oa ie 1 Cao aaa = ((A ye 2 sin (A ye : 


lS 


Ca las de 
75 


Remark 5.49. Each power in the numerator may be expanded using 
the binomial theorem. It’s a good exercise to simplify the numerator in 
this way, and to prove the formula really does generate only integers. 


Matrix Exponentiation 


Definition 5.50. The trace of an n xX n matrix A is the sum of its 
diagonal entries, 


n 
tr(A) = A +--+ AR = 0 AL 
j=l 


Proposition 5.51. Let A and B be n x n matrices. 
(i) tr(BA) = tr(AB). 
(ii) If P is invertible, then tr(P~'!AP) = tr(A). 


(iii) If A is diagonalizable, then tr(A) is the sum of the eigenvalues, 
counting multiplicity. 


Proof. (i). Since (AB)' =e Ai,BF, 


tr(AB) = 3 3s AUB = se ss BY At = tr(BA). 


i=1 k=1 k=1 i=1 
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(ii). By (i), tr(P~1(AP)) = tr((AP)P7) = tr(A). 


(iii). Suppose A’ = P~! AP. By (ii) tr(A) = tr(4’). 

Since A’ is diagonal, the eigenvalues of A’ are its diagonal entries, 
so tr(A’) is the sum of the eigenvalues, say A 4/. 

The matrices A and A’ have the same characteristic polynomial, 
hence the same eigenvalues. In particular, Ay: = Ay, the sum of the 
eigenvalues of A. 

Combining these observations, tr(A) = tr(A’) = Ay = Ag. 


Definition 5.52. Let A= [A‘] be an n x n real (or complex) matrix. 
The exponential series is defined by 


(oe) 
tA) A? As 
exp(ta) = 2) =I+tA+? si mea ar hans 
ok! | | 


Remark 5.53. Since there are only finitely many entries of A, there is a 
real number M such that |Aj| < M for alli and j. By induction on m, 


A™)*| < (Mn)* for all k > 0; for example, if m = 1, we have 
Jj 


Ala! 


l=1 


n 
(A7)§] = < 57 |AiAS| < M?n < (Mn)’. 


f=1 


It follows that each entry in the exponential series is bounded in abso- 
lute value by the convergent series 
(oe) 
(tM ie etM n 
k) 


k=0 


Example 5.54. If A = diag|d',d?,...,d"], then A” is the diagonal 
matrix whose entries are the mth powers of the entries of A, so 


1 2 nm 
exp(tA) = diag[e”@ re ie 


Example 5.55. If N is the 4 x 4 nilpotent block, its fourth power 
is 0*%4, so N, N?, N®, and exp(tN) are 


0100 0010 0001 Pe ee ae 
0010 0001 0000 git? 
GO" HOweeOy > MOG OR iNecgaca> Se 
0000 0000 0000 Bese Wen 
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Example 5.56. If N = B,(0) is the nxn nilpotent block, then N™ has 
a diagonal of 1’s m rows above the diagonal, and N” = 0”"*”". The 
exponential series is a polynomial, and 


Fe yr-2 trot 
Lt oy +} Go jon 

0 l t 42-3 tr-2 
“* (m—3)!  (n—2)! 

exp(tN) = a ; : 
2 

0: 0; °Or 22g ot a 

> 0% 0 oas 1 t 

Oot Uy age 0 1 


Theorem 5.57. Let A and B be n x n matrices. 

(i) If P is an invertible n x n matrix, then 

exp(tP~'AP) = P~lexp(tA)P. 

(ii) If BA = AB, then exp(t(A + B)) = exp(tA) exp(tB). 
(iii) If A is diagonalizable, then det exp(tA) = eA), 
(iv) exp(tA') = [exp(tA)]" 

(v) If A’ =—A, then exp(tA) is orthogonal and has determinant 1. 
Proof. (i). This is a consequence of the identity (P~!AP)* = P~!AFP: 


SER APR tA)* 
exp(t{P~| AP) = > woe = Spee ue P = P~‘exp(tA)P. 
k=0 , k=0 : 


(ii). This follows from the binomial formula 
(A+ B)F | 3 Ai Bk-i = Ai Bi 
7 | 7) oa Ge Filia 

k! = (k — i)! Fira ee 


which holds for commuting matrices, together with the “Cauchy prod- 
uct formula” for absolutely ee double series: 


exp (a+ 5)) = 5° HABIT _ 9 ym (tay eB 


k=0 k=0 i+j=k 
= oS Oe Andie 
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(iii). Suppose P~!AP = A’ = diag{d',...,d”] for some invertible 
matrix P. By (i) and Example 5.54, 


Pl exp(tA)P = exp(tA’) = diag[e!’, eft ee ety. 


Taking determinants, 
n 
det exp(tA) = det exp(tA’) = I] eft = et nd — ottr(4!) — ettr(A), 
k=1 


(iv). This is an immediate consequence of (A*)' = (AT)*. 
(v). Clearly exp(0”*”) = I,,, so by (ii), exp(—tA) = exp(tA)~!. If 
Al =—A. then 
exp(tA)~! = exp(—tA) = exp(tA') = exp(tA)', 


which proves exp(tA) is orthogonal. 
Since the diagonal entries of a skew-symmetric matrix are all 0, 
tr(A) = 0. By (iii), det exp(tA) = et *4) = e9 = 1. 


Remark 5.58. An orthogonal n x n matrix of determinant 1 is, by 
definition, a Euclidean rotation. Part (v) of the theorem says that a 
skew-symmetric matrix is an “infinitesimal” rotation. 


Example 5.59. If 


Os a4 cost —sint 
A= 1 , then exp(tA) = bee Ae H : 
To prove this, note that A? = —Jy, from which it follows immediately 


that A2* = (—1)*Iy and A7*+! = (—1)*A. Splitting the exponential 
series into terms of even and odd degree and recalling the power series 
for the circular functions, 


cost = spe, sint = oy. 
os (2k)! ar (2k + 1)! 
we have 
7 Co (t.A)?* | oo GeAynts 
a ae D Qk * A (ak +0) 
oe) p2k oo pek+l 
= (Lug) 8+ (Lara) 


0 k=0 
= (cos t)Ip + (sint)A. 
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Exercises 


Exercise 5.1. Find the eigenvalues and bases for the eigenspaces of 
the following matrices, and determine which are diagonalizable: 


40 0 0 4 0 0 0 400 0 4 0 0 0 
0 4 0 0 04 0 0 040 0 04 0 0 
00 4 0 00 2 0 0 0: 2: 0 0-0 2.90 
100 4 TL. Oh, De <2 00 1 2 0 0 1 0 


Exercise 5.2. On the vector space V = F(R,R) of all real-valued 
functions on R under function addition and scalar multiplication, let 
R be the “domain reflection” operator defined by (Rf)(t) = f(t). 


(a) Show R is a linear involution. 


(b) Identify the eigenspaces of R, and show that every function can be 
written uniquely as a sum of eigenvectors. 


(c) Find the decomposition of f(t) =e! into eigenvectors. 
Exercise 5.3. Prove Proposition 5.37. 
Exercise 5.4. An operator Il: V > V is a projection if II? = II. 


(a) Prove that if II is a projection, then Jy — II is a projection, and 
the only eigenvalues of II are 0 and 1. 


(b) Assuming “optimistically” that every vector v in V decomposes as 
asum v? + vu! of eigenvectors, find formulas for v? and v! in terms 
of v and II(v). 


(c) Show that the vectors v° and v! found in part (b) are eigenvectors 
of II. Conclude that II is diagonalizable. 


(d) Find a projection on R? whose standard matrix is not symmetric. 


Exercise 5.5. If Il: V > V is a projection (preceding exercise), the 
operator R = I, — 2IL is a reflection. 


(a) Prove that a reflection is an involution. 


(b) How are the eigenvalues and eigenspaces of II related to the eigen- 
spaces of R? 
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(c) Find a reflection on R? whose standard matrix is not symmetric. 
Exercise 5.6. Let T': V > V be an operator satisfying T? = T. 
(a) Find the possible eigenvalues of T. 


(b) Assuming “optimistically” that every vector v in V decomposes as 
a sum of eigenvectors, find formulas for the each summand in terms 


of v, T(v) and T?(v). 
(c) Show that the vectors found in part (b) are eigenvectors of T. 


Conclude that T is diagonalizable. 


Exercise 5.7. Let A be a non-zero n x n matrix satisfying A* = 0"*” 
for some integer 2 << k <n. 


(a) Prove that every eigenvalue of A is 0. 


) 

(b) If A? = 0”*", prove (I+ A)-'=I- A+ A’. 

(c) Find a formula for (J + A)~! if A* = 0"*" for some k > 3. 
) 


(d) Prove J + A is not diagonalizable. 
Hint: What are the eigenvalues? 


Exercise 5.8. Let S = (v;)7_) be a basis of R". Prove S is orthonor- 


mal if and only if vv, fees Unv) =I, 


Exercise 5.9. Let B be an arbitrary n x n real matrix, and put A = 
B'B, noting that A is symmetric. Prove that every eigenvalue of A 
is non-negative, and 0 is an eigenvalue of A if and only if B is not 
invertible. 

Hint: First show that (x, Ax) = ||Ba||? for every x in R”. 


Exercise 5.10. Let a, b, and c be real numbers. 


(a) Find the eigenvalues and eigenspaces of A = F | 2 


(b) Prove (by direct computation) that the eigenspace of A are orthog- 
onal. 


(c) Prove the eigenvalues have opposite sign if and only if ac—b? < 0. 


(d) Prove the eigenvalues have the same sign if and only if ac—b? > 0, 
and their mutual sign is the sign of a. 
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O22 


Exercise 5.11. Let A = , 3 


F Find a formula for A”, n > 0 an 


arbitrary integer. 


Exercise 5.12. Show that if A is skew-symmetric and 4 is a real eigen- 
value, then » = 0. 
Hint: Let v be a \-eigenvector, and compute (Av, v). 


Exercise 5.13. Let A and B be n x n matrices. We say A and B 
are simultaneously diagonalizable if there exists an invertible matrix P 
such that A’ = P~'AP and B’ = P~'!BP are diagonal. 

Prove that simultaneously diagonalizable matrices commute, i.e., 
BA= AB. 
Hint: What can you say about the product of diagonal matrices? 


Exercise 5.14. Let A and B be commuting n x n matrices. 


(a) Show that if v is a \-eigenvector of A, then Bv is in the A- 
eigenspace of A. That is, B preserves the eigenspaces of A. 


(b) Prove that if A is diagonalizable and B is invertible, then A and 
B are simultaneously diagonalizable. 
Hint: Show that A and B~! commute, then use part (a) to show 
the eigenspaces of B are precisely the eigenspaces of A. 


(c) Remove the hypothesis in part (b) that B is invertible. 
Suggestion: B is diagonalized by P if and only if B+ cl, is; show 
that there exists a real number c so that B+ cl, is invertible. 
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Index 


Angle, 56-57 
Augmented matrix, 9, 10, 19 


Basic variable, 10 
number of is rank, 91 
Basis, 41 
algorithm for testing, 48 
counting elements of, 46 
dual, 2 
existence of, 41 
linear transformation 
determined by action on, 
80 
orthonormal, 58 
standard, 2, 42 
Bijective, 74 
Bilinear function, 53 
Block matrix, 2 
Box, 64 


Cauchy-Schwarz inequality, 55 
Change of basis, 83-84 
Characteristic polynomial 
of an operator, 100 
Column matrix, 2, 4 
Column space of a matrix, 86-88 
basis of, 88 
Column vector, 25, 29 
Commutative diagram, 83 
Complementary subspace, 46 


Complex numbers, 32 
Coordinate vector, 41—42, 76 
Cosets 

of a subspace, 88 

set of, as vector space, 89 

vector space operations on, 89 


Decomposition 
into parallel and orthogonal 
components, 57 
of identity into orthogonal 
projections, 59, 108 
with respect to a direct sum, 
ot 
with respect to orthogonal 
subspaces, 62 
Ji, see Kronecker delta 
Determinant, 64-71 
effect of row operations on, 69 
of an operator, 100 
of invertible matrix, 69 
of matrix product, 70 
of similar matrices, 70, 100 
of transpose, 68 
of triangular matrix, 68 
polynomial formula for, 66-68 
using row operations to 
compute, 70-71 
Diagonal matrix, 3, 47, 95 
linear transformation 
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associated to, 78 
powers of, 108 
Diagonalizable operator, 95 
criteria for, 97-99 
over the complex numbers, 
102 
typifying examples, 101-102 
Differential equation, 33 
Dimension, 43 
of (R”)*, 44 
of Re Ad 
of R”, 44 
of Py, 44 
of a subspace, 43 
of direct sum, 46 
of sum, 46 
The Dimension Theorem, 46 
Direct sum, 37, 62 
decomposition of R”*”, 38 
decomposition of vectors, 37 
dimension of, 46 
Dual pairing, 4 
Dummy index, 1, 25 


Edge, see Box 
e’, 2,17 
Eigenbasis, 96 
Eigenspace, 96 
Eigenvalue, 96 
non-real, 101 
Eigenvector, 96 
e', 4, 17, 29, 42, 47, 82 
ej, 25 17 
Elementary matrix, 17-19 
Elementary row operation, 11—18 
Equality of matrices, 6 
Exponential series, 112 
examples of, 112 
Extension by linearity, 80 


Fibonacci numbers, 109-111 

closed formula for, 109 
Finite-dimensional, 35, 43 
Free variable, 10 

number of, 45 

number of is nullity, 91 
Functions 

as vectors, 24 

vector space F(X, R), 24, 33 
Fundamental Theorem of 

Algebra, 101 


Gaussian elimination, see Row 
reduction 

General linear group, 8 

GL(n, R), see General linear 
group 

Golden ratio, 109 

Gram-Schmidt algorithm, 61 

example of, 63 


Homogeneous linear system, 9 


Identity map, 76 
matrix with respect to a basis, 
81 
Identity matrix, 3, 5, 8, 29 
Image of a linear transformation, 
74 
In, see Identity matrix 
Injective, 74 
criterion for, 74 
relation with linear 
independence, 80 
Inner product, 4, 53-57 
preserved by orthogonal 
matrix, 60 
Inverse matrix, 8 
algorithm for, 18-19 
determinant of, 70 
formula for 2 x 2, 16 
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of product, 8 

Uniqueness of, 8 
Involution, 99 
Isomorphism, 74 


Jordan block matrix, 104 


Kernel of a linear 
transformation, 74 
Kronecker delta, 2-3, 6, 29 


Lagrange interpolation 
polynomials, 51 
Law of cosines, 56 
Leading 1, 10 
Linear combination, 28-29 
empty, 28, 36 
from a set of vectors, 35, 37, 
39 
non-trivial, 28, 39 
Linear operator, 95 
diagonalizable, 95 
Linear system, 9-15, 84-86 
augmented matrix, 9, 10 
basic variable of, 10 
coefficient matrix, 9 
consistent, 9 
free variable of, 10 
homogeneous, 9, 32, 43, 45 
inconsistent, 10 
non-homogeneous, 9 
non-trivial solution, 9, 10, 43 
particular solution of, 14, 91 
solution set, 9, 32, 34, 63, 91 
dimension of, 45 
Linear transformation, 73 
associated to a matrix, 76 
composition, 74 
composition as commutative 
diagram, 83 


composition as matrix 
product, 82 
determined by action on a 
basis, 80 
effect on linear combination, 
res 
evaluation as commutative 
diagram, 83 
evaluation as matrix product, 
82 
image of, 74 
induced isomorphism, 90 
kernel of, 74 
matrix of, 81 
nullity of, 88 
rank of, 88 
Linear transformations 
vector space of, 79 
Linearly dependent, 39 
Linearly independent, 39, 40 
algorithm for testing, 48 
maximum number of 
elements, 42 
number of elements, 44 
empty set, 39 


Magnitude of a vector, 54 
Matrix, 1 
2 x 2 as complex number, 
32-33 
as linear combination of outer 
products, 29 
block, 2 
column, 2, 4 
detecting equality, 6, 29 
diagonal, 3, 47 
inverse of, 8 
of linear transformation, 81 
orthogonal, 59-60 
real, 1 
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row, 2, 4 
row reduction, 9-18 
row-echelon form of, 10 
skew-symmetric, 32 
square, 3 
symmetric, 32 
trace of, 111 
triangular, 47 
Matrix inverse 
algorithm for, 18-19 
Matrix multiplication, 4-6 
associative law, 5, 8 
definition of, 4 
distributive law, 5, 34 
linear transformation induced 
by, 76 
not commutative, 6 
properties of, 5, 34 
Matrix transpose, 3, 7 


Nilpotent matrix, 104 
Non-homogeneous linear system, 
9 

Non-trivial solution 

of a linear system, 9 

Nullity of a linear 
transformation, 88 
Nullspace of a matrix, 86 


Operator 
characteristic polynomial of, 
100 
determinant of, 100 
diagonalizable over the 
complex numbers, 102 
linear, 95 
symmetric, 105-111 
trace of, 111 
Oriented volume, 64 
axioms for, 65 


Orthogonal complement, 61 
Orthogonal group, 60 
Orthogonal matrix, 59-60 
as exponential of 
skew-symmetric matrix, 
113 
columns of, 59 
preserves dot product, 60 
Orthogonal projection, 57-59, 
62, 105 
best approximation property, 
62 
Orthogonal vectors, 56 
Orthonormal basis, 58 
existence of, 61 
matrix of symmetric operator 
in, 106 
Outer product, 4, 29, 42, 78 


Parallelipiped, see Box 
Parallelogram 
definition of, 72 
linear transformation 
preserves, 73 
Parallelogram law, 27 
Particular solution, 14 
Permutation matrix, 21 
Pn, see Polynomials 
Pointwise operations, 24 
Polynomials 
Lagrange interpolation, 51 
trigonometric, 26 
vector space of, 26, 33 
dimension, 44 
Projection 
orthogonal, 62 
Projection matrix, 78 
The Pythagorean Theorem, 57 
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Quadratic formula, 55 
Quotient vector space, 88 
dimension of, 89 


Rank of a linear transformation, 


88 
Rank of a matrix, 87 
Rank-Nullity Theorem, 86-91 
statement of, 87, 90 
Rank-one matrix, 4 
symmetric, 108 
Rank-one transformation, 82 
diagonalizability of, 99 
Reflection matrix, 59, 78 
Reverse order law, 8 
Ress 1 
dimension of, 44 
R”, 2, 17, 29 
dimension of, 44 
subspace of, as solution of a 
linear system, 50, 63 
(R”)*, 2, 17, 29 
dimension of, 44 
Rotation matrix, 59, 78 
Row matrix, 2, 4 
Row operation, 11-18 
effect on determinant, 69 
via elementary matrix, 17 
Row reduction, 9-18 
algorithm for, 12-15 
Row space of a matrix, 86 
Row vector, 29 
Row-echelon form, 10 
uniqueness of, 15-16 


Scalar matrix, 95 
matrix similar to, 95 
Sequence 
finite, 45 
real, 25, 45 


Similar matrices, 84, 95 
determinant of, 70, 100 
powers of, 108 
trace of, 111 

Similar matrix 
example of, 84 

Skew-symmetric matrix, 32, 38, 

44 
exponential of, 113 

Smooth functions, 77 

Solution set, 9, 34 
as a subspace, 32, 34 

Span, 35-36 
as intersection of all 

containing subspaces, 36 
of empty set, 36 
of proper subset, 39 
properties of, 35 

Spanning 
algorithm for testing, 48 

Square matrix, 3 

Standard basis, 2, 29, 42, 44 
as edges of unit cube, 64 

Standard dual basis, 2, 29, 44 

Subspace, 30-38 
as solution space of a linear 

system, 50, 63 
closed under linear 

combinations, 30 
complementary, 46 
criteria for, 30 
dimension of, 43 
proper, 31 
trivial, 31 

Subspaces 
direct sum of, 37 
examples of, 31-33 
intersection of, 33, 34, 36 
sum of, 37-38 

as span of union, 37 
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union of, 33 
Sum of subspaces, 37-38 
dimension of, 46 
Superscripts as indices, 25 
Surjective, 74 
relation with spanning, 80 
Symmetric matrix, 32, 38, 44, 
106 
Symmetric operator, 105-111 
diagonalized by orthogonal 
matrix, 107 
has orthogonal eigenspaces, 
106 
has real eigenvalues, 106 
in an orthonormal basis, 106 
orthogonal projection is, 105 


Trace of a matrix 
as sum of eigenvalues, 111 
Transition matrix, 83 
as translator, 83 
example of, 84 
properties of, 83 
Transpose operator, 3 
diagonalizability of, 100 
eigenspaces of, 100 
linearity of, 3 


of a product, 7 
Triangular matrix, 47, 95 
determinant of, 68 
Trigonometric polynomials, 26 


Unit cube in R”, 57, 64 
Unit vector, 54-55 


Vector 

column, 25 

magnitude of, 54 

unit, 54 
Vector operations 

geometry of, 27 
Vector space 

axioms for, 23 

of all linear transformations, 

79 

of polynomials, 26 

quotient by a subspace, 88 
Vector subspace, see Subspace 
Volume, see Oriented volume 


Zero map, 76 

criterion for, 79 
Zero matrix, 6 
ov”, 24 
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